2024-08-09 12:33:18,271 INFO [train_multi_KD3.py:1187] (1/4) Training started 2024-08-09 12:33:18,271 INFO [train_multi_KD3.py:1197] (1/4) Device: cuda:1 2024-08-09 12:33:18,275 INFO [train_multi_KD3.py:1212] (1/4) Using dtype=torch.bfloat16 2024-08-09 12:33:18,276 INFO [train_multi_KD3.py:1214] (1/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': 'a6c2f7a4-dirty', 'icefall-git-date': 'Thu Aug 8 16:21:21 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'blank_id': 0, 'vocab_size': 500, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-09 12:33:18,276 INFO [train_multi_KD3.py:1216] (1/4) About to create model 2024-08-09 12:33:18,696 INFO [model_shift.py:142] (1/4) Delta_t: 6 when computing the distillation loss 2024-08-09 12:33:18,700 INFO [train_multi_KD3.py:1220] (1/4) Number of model parameters: 66484678 2024-08-09 12:33:20,497 INFO [train_multi_KD3.py:1235] (1/4) Using DDP 2024-08-09 12:33:22,046 INFO [kd_datamodule.py:690] (1/4) About to get train 960 cuts 2024-08-09 12:33:22,097 INFO [train_multi_KD3.py:1306] (1/4) Getting audioset cuts 2024-08-09 12:33:22,097 INFO [kd_datamodule.py:900] (1/4) About to get the audioset cuts for KD. 2024-08-09 12:33:22,100 INFO [kd_datamodule.py:869] (1/4) About to get the voxceleb cuts. 2024-08-09 12:33:22,102 INFO [kd_datamodule.py:880] (1/4) Adding voxceleb2 cuts. 2024-08-09 12:33:22,103 INFO [train_multi_KD3.py:1320] (1/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-09 12:33:30,840 INFO [train_multi_KD3.py:1322] (1/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1904746) [underlying data type: ], CutSet(len=1187704) [underlying data type: ]] 2024-08-09 12:33:30,840 INFO [train_multi_KD3.py:1323] (1/4) Using weights: [1406195, 1904746, 1187704] 2024-08-09 12:33:30,840 INFO [train_multi_KD3.py:1332] (1/4) CutSet(len=4498645) [underlying data type: ] 2024-08-09 12:33:30,840 INFO [kd_datamodule.py:449] (1/4) Disable MUSAN 2024-08-09 12:33:30,841 INFO [kd_datamodule.py:489] (1/4) Disable SpecAugment 2024-08-09 12:33:30,842 INFO [kd_datamodule.py:491] (1/4) About to create train dataset 2024-08-09 12:33:30,842 INFO [kd_datamodule.py:528] (1/4) Using SimpleCutSampler 2024-08-09 12:33:30,843 INFO [kd_datamodule.py:536] (1/4) About to create train dataloader 2024-08-09 12:33:30,845 INFO [kd_datamodule.py:763] (1/4) About to get dev-clean cuts 2024-08-09 12:33:30,847 INFO [kd_datamodule.py:781] (1/4) About to get dev-other cuts 2024-08-09 12:33:30,848 INFO [kd_datamodule.py:570] (1/4) About to create dev dataset 2024-08-09 12:33:31,123 INFO [kd_datamodule.py:591] (1/4) About to create dev dataloader 2024-08-09 12:33:31,123 INFO [kd_datamodule.py:840] (1/4) About to get the test set of voxceleb1 set. 2024-08-09 12:33:31,127 INFO [kd_datamodule.py:570] (1/4) About to create dev dataset 2024-08-09 12:33:31,361 INFO [kd_datamodule.py:591] (1/4) About to create dev dataloader 2024-08-09 12:33:31,361 INFO [kd_datamodule.py:912] (1/4) About to get the audioset eval cuts. 2024-08-09 12:33:31,366 INFO [kd_datamodule.py:570] (1/4) About to create dev dataset 2024-08-09 12:33:31,834 INFO [kd_datamodule.py:591] (1/4) About to create dev dataloader 2024-08-09 12:33:31,834 INFO [train_multi_KD3.py:1412] (1/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-09 12:33:47,516 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 0, loss[loss=1.169, beats_loss=0.8428, ecapa_loss=0.002277, whisper_loss=0.3036, over 20291.00 frames. ], tot_loss[loss=1.169, beats_loss=0.8428, ecapa_loss=0.002277, whisper_loss=0.3036, over 20291.00 frames. ], batch size: 83, lr: 2.25e-02, grad_scale: 2.0 2024-08-09 12:33:47,517 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-09 12:34:33,682 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on ASR_libri: loss=0.9193, beats_loss=0, ecapa_loss=0.006113, whisper_loss=0.8581, over 922467.00 frames. 2024-08-09 12:34:48,241 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on SV_voxceleb1: loss=0.05055, beats_loss=0, ecapa_loss=0.005055, whisper_loss=0, over 939242.00 frames. 2024-08-09 12:35:31,954 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3538, 5.3835, 5.3121, 5.3739], device='cuda:1') 2024-08-09 12:36:59,528 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on AT_audioset: loss=1.752, beats_loss=1.752, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 12:36:59,530 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-09 12:37:03,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=7.5 2024-08-09 12:37:06,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=0.0, ans=0.5 2024-08-09 12:37:07,228 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=3.0 2024-08-09 12:37:17,543 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 12:37:18,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=0.0, ans=0.9 2024-08-09 12:37:19,078 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=12.76 vs. limit=5.0 2024-08-09 12:37:51,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=200.0, ans=0.1925 2024-08-09 12:37:55,322 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=240.28 vs. limit=7.65 2024-08-09 12:37:58,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=328.14 vs. limit=7.575 2024-08-09 12:38:00,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=200.0, ans=5.125 2024-08-09 12:38:11,440 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=24.19 vs. limit=4.08 2024-08-09 12:38:14,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=200.0, ans=5.1 2024-08-09 12:38:18,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=300.0, ans=0.8895000000000001 2024-08-09 12:38:28,130 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=209.54 vs. limit=7.6125 2024-08-09 12:38:40,509 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-09 12:38:40,997 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=39.88 vs. limit=7.65 2024-08-09 12:38:45,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=400.0, ans=0.20600000000000002 2024-08-09 12:38:45,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=400.0, ans=0.02 2024-08-09 12:38:49,588 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-09 12:38:52,159 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=78.09 vs. limit=5.1 2024-08-09 12:38:54,731 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=4.16 2024-08-09 12:39:03,486 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=291.91 vs. limit=7.6875 2024-08-09 12:39:04,958 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 50, loss[loss=0.2144, beats_loss=0.02306, ecapa_loss=0.001862, whisper_loss=0.1727, over 21670.00 frames. ], tot_loss[loss=0.3389, beats_loss=0.1293, ecapa_loss=0.001963, whisper_loss=0.19, over 873610.07 frames. ], batch size: 88, lr: 2.48e-02, grad_scale: 2.0 2024-08-09 12:39:08,044 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=427.49 vs. limit=7.6875 2024-08-09 12:39:21,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=31.15 vs. limit=4.2 2024-08-09 12:39:21,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=450.10 vs. limit=7.875 2024-08-09 12:39:26,427 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=228.72 vs. limit=7.725 2024-08-09 12:39:30,399 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=30.53 vs. limit=4.24 2024-08-09 12:39:32,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=600.0, ans=0.471875 2024-08-09 12:39:40,500 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 12:39:49,552 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=465.11 vs. limit=7.7625 2024-08-09 12:39:49,660 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=247.44 vs. limit=5.35 2024-08-09 12:40:04,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=700.0, ans=0.293 2024-08-09 12:40:10,330 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=235.26 vs. limit=7.8 2024-08-09 12:40:13,003 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.78 vs. limit=3.12 2024-08-09 12:40:18,083 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 12:40:18,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=186.84 vs. limit=7.8 2024-08-09 12:40:27,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=800.0, ans=0.872 2024-08-09 12:40:44,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=900.0, ans=0.8685 2024-08-09 12:40:44,968 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=445.37 vs. limit=7.8375 2024-08-09 12:40:45,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=23.86 vs. limit=5.225 2024-08-09 12:40:51,515 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.930e+01 4.445e+01 8.118e+01 2.890e+03, threshold=8.890e+01, percent-clipped=0.0 2024-08-09 12:40:51,538 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 100, loss[loss=0.1837, beats_loss=0.02021, ecapa_loss=0.001856, whisper_loss=0.145, over 17478.00 frames. ], tot_loss[loss=0.2691, beats_loss=0.069, ecapa_loss=0.00193, whisper_loss=0.1808, over 1530929.99 frames. ], batch size: 69, lr: 2.70e-02, grad_scale: 4.0 2024-08-09 12:40:58,945 WARNING [optim.py:496] (1/4) Scaling gradients by 0.048358626663684845, model_norm_threshold=88.8975601196289 2024-08-09 12:40:59,117 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.4.encoder.layers.2.norm.log_scale with proportion 0.88, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.987e+06, grad_sumsq=2.987e+06, orig_rms_sq=1.000e+00 2024-08-09 12:41:01,709 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=197.40 vs. limit=8.25 2024-08-09 12:41:14,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1100.0, ans=5.55 2024-08-09 12:41:25,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1100.0, ans=0.4484375 2024-08-09 12:41:25,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=220.42 vs. limit=7.9125 2024-08-09 12:41:28,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1200.0, ans=0.44375 2024-08-09 12:41:28,217 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=205.22 vs. limit=7.95 2024-08-09 12:41:32,429 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=70.24 vs. limit=5.0 2024-08-09 12:41:34,604 WARNING [optim.py:496] (1/4) Scaling gradients by 0.011974900029599667, model_norm_threshold=88.8975601196289 2024-08-09 12:41:34,765 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.96, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.313e+07, grad_sumsq=5.313e+07, orig_rms_sq=1.000e+00 2024-08-09 12:41:36,438 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-09 12:41:39,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=254.38 vs. limit=8.4 2024-08-09 12:41:52,544 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 12:41:54,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1300.0, ans=0.4390625 2024-08-09 12:41:59,895 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=29.28 vs. limit=5.7 2024-08-09 12:42:06,201 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=86.93 vs. limit=8.025 2024-08-09 12:42:08,880 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=287.34 vs. limit=8.025 2024-08-09 12:42:15,736 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 150, loss[loss=0.178, beats_loss=0.02001, ecapa_loss=0.002045, whisper_loss=0.1375, over 17260.00 frames. ], tot_loss[loss=0.2436, beats_loss=0.04956, ecapa_loss=0.001893, whisper_loss=0.1751, over 2008814.09 frames. ], batch size: 73, lr: 2.93e-02, grad_scale: 4.0 2024-08-09 12:42:22,461 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=301.76 vs. limit=8.0625 2024-08-09 12:42:26,427 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 12:42:28,157 WARNING [optim.py:496] (1/4) Scaling gradients by 0.04562794789671898, model_norm_threshold=88.8975601196289 2024-08-09 12:42:28,314 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.64, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.426e+06, grad_sumsq=2.426e+06, orig_rms_sq=1.000e+00 2024-08-09 12:42:29,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1500.0, ans=0.14375 2024-08-09 12:42:30,920 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=4.64 2024-08-09 12:42:31,078 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.96 vs. limit=8.7 2024-08-09 12:42:37,366 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.01 vs. limit=5.4 2024-08-09 12:42:38,564 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=277.62 vs. limit=8.1 2024-08-09 12:42:46,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1700.0, ans=0.283 2024-08-09 12:42:46,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=59.34 vs. limit=8.1375 2024-08-09 12:42:59,018 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=167.92 vs. limit=8.775 2024-08-09 12:43:09,932 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=271.96 vs. limit=8.175 2024-08-09 12:43:12,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1800.0, ans=0.415625 2024-08-09 12:43:23,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1900.0, ans=0.4109375 2024-08-09 12:43:31,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1900.0, ans=0.12875 2024-08-09 12:43:34,101 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.71 vs. limit=4.8 2024-08-09 12:43:34,315 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+01 2.774e+01 3.640e+01 5.016e+01 7.424e+03, threshold=7.280e+01, percent-clipped=13.0 2024-08-09 12:43:34,335 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 200, loss[loss=0.2701, beats_loss=0.01416, ecapa_loss=0.002105, whisper_loss=0.2348, over 23088.00 frames. ], tot_loss[loss=0.2309, beats_loss=0.0389, ecapa_loss=0.001883, whisper_loss=0.1731, over 2405776.45 frames. ], batch size: 89, lr: 3.15e-02, grad_scale: 8.0 2024-08-09 12:43:35,440 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=94.36 vs. limit=8.25 2024-08-09 12:43:40,208 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=195.23 vs. limit=9.0 2024-08-09 12:43:41,446 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=130.03 vs. limit=8.25 2024-08-09 12:43:41,527 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=163.85 vs. limit=8.25 2024-08-09 12:43:43,367 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=9.0 2024-08-09 12:43:43,732 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 12:43:52,186 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=159.53 vs. limit=8.2875 2024-08-09 12:43:54,404 WARNING [optim.py:496] (1/4) Scaling gradients by 0.06407187134027481, model_norm_threshold=72.79639434814453 2024-08-09 12:43:54,575 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.47, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.083e+05, grad_sumsq=6.083e+05, orig_rms_sq=1.000e+00 2024-08-09 12:44:05,472 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=96.54 vs. limit=8.325 2024-08-09 12:44:11,240 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=59.99 vs. limit=6.1 2024-08-09 12:44:13,405 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 12:44:15,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2200.0, ans=0.8230000000000001 2024-08-09 12:44:15,949 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=426.17 vs. limit=8.325 2024-08-09 12:44:22,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=36.18 vs. limit=8.3625 2024-08-09 12:44:23,593 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=155.34 vs. limit=9.225 2024-08-09 12:44:25,258 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=70.93 vs. limit=8.3625 2024-08-09 12:44:26,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2300.0, ans=0.8195 2024-08-09 12:44:27,025 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=125.88 vs. limit=6.15 2024-08-09 12:44:27,089 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=34.99 vs. limit=9.225 2024-08-09 12:44:31,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2300.0, ans=0.3921875 2024-08-09 12:44:37,633 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=118.23 vs. limit=6.2 2024-08-09 12:44:40,659 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=218.80 vs. limit=9.3 2024-08-09 12:44:45,322 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=91.08 vs. limit=8.4 2024-08-09 12:44:48,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=35.89 vs. limit=8.4 2024-08-09 12:44:52,228 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 250, loss[loss=0.1931, beats_loss=0.0179, ecapa_loss=0.001854, whisper_loss=0.1566, over 16274.00 frames. ], tot_loss[loss=0.2223, beats_loss=0.03262, ecapa_loss=0.001871, whisper_loss=0.171, over 2681947.44 frames. ], batch size: 63, lr: 3.38e-02, grad_scale: 8.0 2024-08-09 12:45:05,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2500.0, ans=0.0421875 2024-08-09 12:45:14,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2600.0, ans=0.378125 2024-08-09 12:45:38,427 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=200.26 vs. limit=8.55 2024-08-09 12:45:41,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2800.0, ans=0.36875 2024-08-09 12:45:42,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=19.02 vs. limit=5.7 2024-08-09 12:45:42,325 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=62.41 vs. limit=8.55 2024-08-09 12:45:46,114 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 12:45:50,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2800.0, ans=6.75 2024-08-09 12:45:54,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2900.0, ans=0.27099999999999996 2024-08-09 12:45:59,970 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 15 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-09 12:46:04,098 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=39.99 vs. limit=8.5875 2024-08-09 12:46:07,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=170.63 vs. limit=8.5875 2024-08-09 12:46:10,766 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 3.536e+01 4.623e+01 6.113e+01 1.136e+03, threshold=9.245e+01, percent-clipped=13.0 2024-08-09 12:46:10,785 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 300, loss[loss=0.1891, beats_loss=0.02122, ecapa_loss=0.001679, whisper_loss=0.1511, over 18500.00 frames. ], tot_loss[loss=0.2138, beats_loss=0.02853, ecapa_loss=0.001837, whisper_loss=0.1669, over 2928853.92 frames. ], batch size: 72, lr: 3.60e-02, grad_scale: 8.0 2024-08-09 12:46:22,380 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=15.89 vs. limit=5.75 2024-08-09 12:46:31,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3100.0, ans=0.0403125 2024-08-09 12:46:39,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3100.0, ans=0.0403125 2024-08-09 12:46:39,771 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=112.86 vs. limit=8.6625 2024-08-09 12:46:59,826 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.635e+01 2024-08-09 12:47:05,627 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 12:47:18,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3400.0, ans=0.340625 2024-08-09 12:47:22,045 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=62.33 vs. limit=10.05 2024-08-09 12:47:23,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3400.0, ans=0.07250000000000001 2024-08-09 12:47:27,950 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=5.4 2024-08-09 12:47:28,131 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=10.125 2024-08-09 12:47:28,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 350, loss[loss=0.1527, beats_loss=0.01747, ecapa_loss=0.001669, whisper_loss=0.1185, over 17157.00 frames. ], tot_loss[loss=0.208, beats_loss=0.02549, ecapa_loss=0.001804, whisper_loss=0.1644, over 3122156.44 frames. ], batch size: 67, lr: 3.83e-02, grad_scale: 8.0 2024-08-09 12:47:29,310 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=69.98 vs. limit=8.8125 2024-08-09 12:47:35,626 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=21.34 vs. limit=8.8125 2024-08-09 12:47:40,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3500.0, ans=6.75 2024-08-09 12:47:44,615 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-09 12:47:51,372 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=5.4399999999999995 2024-08-09 12:47:52,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=85.20 vs. limit=8.85 2024-08-09 12:47:54,155 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=34.86 vs. limit=8.85 2024-08-09 12:47:57,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=17.26 vs. limit=8.85 2024-08-09 12:48:02,137 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=23.76 vs. limit=8.8875 2024-08-09 12:48:06,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3700.0, ans=0.263 2024-08-09 12:48:14,632 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=84.91 vs. limit=8.925 2024-08-09 12:48:19,877 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 25 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-09 12:48:29,905 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=44.27 vs. limit=10.425 2024-08-09 12:48:40,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3900.0, ans=0.0378125 2024-08-09 12:48:40,646 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=54.02 vs. limit=10.425 2024-08-09 12:48:42,382 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=88.28 vs. limit=10.425 2024-08-09 12:48:43,948 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.68 vs. limit=6.95 2024-08-09 12:48:44,684 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 12:48:46,234 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.862e+01 3.339e+01 4.177e+01 8.866e+01, threshold=6.678e+01, percent-clipped=0.0 2024-08-09 12:48:46,253 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 400, loss[loss=0.1675, beats_loss=0.01563, ecapa_loss=0.001655, whisper_loss=0.1353, over 18055.00 frames. ], tot_loss[loss=0.2011, beats_loss=0.02337, ecapa_loss=0.001769, whisper_loss=0.1601, over 3273773.10 frames. ], batch size: 74, lr: 4.05e-02, grad_scale: 16.0 2024-08-09 12:49:02,692 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=5.64 2024-08-09 12:49:04,083 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=34.61 vs. limit=10.575 2024-08-09 12:49:08,668 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=21.76 vs. limit=7.05 2024-08-09 12:49:11,863 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.79 vs. limit=5.0 2024-08-09 12:49:23,039 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-09 12:49:28,002 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=64.83 vs. limit=9.075 2024-08-09 12:49:29,555 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.02 vs. limit=7.1 2024-08-09 12:49:30,420 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-09 12:49:37,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=17.28 vs. limit=9.1125 2024-08-09 12:49:39,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.38 vs. limit=6.075 2024-08-09 12:49:40,757 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=10.725 2024-08-09 12:49:44,416 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 12:49:49,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4400.0, ans=0.29375 2024-08-09 12:49:50,152 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.99 vs. limit=10.8 2024-08-09 12:50:00,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4400.0, ans=0.29375 2024-08-09 12:50:03,143 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 450, loss[loss=0.1832, beats_loss=0.02032, ecapa_loss=0.001608, whisper_loss=0.1468, over 21923.00 frames. ], tot_loss[loss=0.1964, beats_loss=0.02173, ecapa_loss=0.001713, whisper_loss=0.1576, over 3436019.40 frames. ], batch size: 90, lr: 4.28e-02, grad_scale: 16.0 2024-08-09 12:50:03,742 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=25.26 vs. limit=10.875 2024-08-09 12:50:07,159 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.80 vs. limit=10.875 2024-08-09 12:50:11,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4500.0, ans=0.2890625 2024-08-09 12:50:25,291 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.83 vs. limit=10.95 2024-08-09 12:50:31,308 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.741e+00 2024-08-09 12:50:35,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4700.0, ans=0.0 2024-08-09 12:50:36,104 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=42.32 vs. limit=9.2625 2024-08-09 12:50:43,295 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.27 vs. limit=11.025 2024-08-09 12:50:48,841 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 12:50:49,616 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=9.3 2024-08-09 12:50:50,313 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 12:51:07,414 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=9.3375 2024-08-09 12:51:13,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4900.0, ans=0.009804347826086957 2024-08-09 12:51:16,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4900.0, ans=0.2703125 2024-08-09 12:51:16,868 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.49 vs. limit=7.45 2024-08-09 12:51:18,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.539e+01 2.551e+01 3.113e+01 4.254e+01 7.113e+01, threshold=6.225e+01, percent-clipped=1.0 2024-08-09 12:51:18,894 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 500, loss[loss=0.1782, beats_loss=0.01745, ecapa_loss=0.00131, whisper_loss=0.1477, over 19317.00 frames. ], tot_loss[loss=0.1925, beats_loss=0.02039, ecapa_loss=0.001673, whisper_loss=0.1554, over 3506467.71 frames. ], batch size: 73, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:51:24,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5000.0, ans=0.04583333333333334 2024-08-09 12:51:31,359 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=21.98 vs. limit=9.375 2024-08-09 12:51:35,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5100.0, ans=0.0 2024-08-09 12:51:39,078 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=9.4125 2024-08-09 12:51:42,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=16.88 vs. limit=9.4125 2024-08-09 12:51:43,549 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=38.10 vs. limit=9.4125 2024-08-09 12:51:46,871 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=9.4125 2024-08-09 12:51:52,167 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-09 12:52:02,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5200.0, ans=0.25625 2024-08-09 12:52:07,150 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 12:52:08,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5300.0, ans=0.009717391304347827 2024-08-09 12:52:10,588 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=9.4875 2024-08-09 12:52:24,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5400.0, ans=0.246875 2024-08-09 12:52:35,441 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 550, loss[loss=0.1371, beats_loss=0.01531, ecapa_loss=0.001492, whisper_loss=0.1069, over 19763.00 frames. ], tot_loss[loss=0.1896, beats_loss=0.01929, ecapa_loss=0.001629, whisper_loss=0.1541, over 3595304.04 frames. ], batch size: 77, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:52:38,085 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=41.83 vs. limit=11.625 2024-08-09 12:52:40,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5500.0, ans=0.245 2024-08-09 12:52:51,110 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 31 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 12:52:52,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5600.0, ans=0.2375 2024-08-09 12:52:56,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5600.0, ans=0.2375 2024-08-09 12:53:02,950 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=37.42 vs. limit=9.6 2024-08-09 12:53:07,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5700.0, ans=0.23281249999999998 2024-08-09 12:53:14,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5700.0, ans=0.23281249999999998 2024-08-09 12:53:15,030 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=6.279999999999999 2024-08-09 12:53:17,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5700.0, ans=0.23281249999999998 2024-08-09 12:53:32,183 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 12:53:39,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5900.0, ans=0.0 2024-08-09 12:53:44,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5900.0, ans=0.6935 2024-08-09 12:53:46,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5900.0, ans=0.2234375 2024-08-09 12:53:52,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.532e+01 2.262e+01 2.880e+01 3.640e+01 5.434e+01, threshold=5.761e+01, percent-clipped=0.0 2024-08-09 12:53:52,646 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 600, loss[loss=0.1928, beats_loss=0.01306, ecapa_loss=0.001489, whisper_loss=0.1649, over 16864.00 frames. ], tot_loss[loss=0.1867, beats_loss=0.01847, ecapa_loss=0.001589, whisper_loss=0.1523, over 3625828.86 frames. ], batch size: 67, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:54:13,910 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.15 vs. limit=12.075 2024-08-09 12:54:14,623 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 12:54:21,169 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-09 12:54:30,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=6200.0, ans=0.20937499999999998 2024-08-09 12:54:33,025 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 8 from Vox, 29 fro AS 2024-08-09 12:54:46,366 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=6.52 2024-08-09 12:54:56,936 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=26.60 vs. limit=9.9 2024-08-09 12:55:10,354 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 650, loss[loss=0.1811, beats_loss=0.01684, ecapa_loss=0.001268, whisper_loss=0.1516, over 22296.00 frames. ], tot_loss[loss=0.1838, beats_loss=0.01785, ecapa_loss=0.001547, whisper_loss=0.1504, over 3694109.40 frames. ], batch size: 86, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:55:32,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=30.90 vs. limit=9.975 2024-08-09 12:55:38,211 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=9.975 2024-08-09 12:55:39,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=30.97 vs. limit=10.0125 2024-08-09 12:55:49,751 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=12.525 2024-08-09 12:55:54,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=6800.0, ans=0.18125000000000002 2024-08-09 12:55:57,265 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=10.05 2024-08-09 12:56:19,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=6900.0, ans=0.1765625 2024-08-09 12:56:21,615 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=12.675 2024-08-09 12:56:21,638 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=10.0875 2024-08-09 12:56:24,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.370e+01 2.323e+01 2.699e+01 3.837e+01 7.112e+01, threshold=5.398e+01, percent-clipped=6.0 2024-08-09 12:56:25,006 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 700, loss[loss=0.1711, beats_loss=0.01496, ecapa_loss=0.001206, whisper_loss=0.1441, over 18206.00 frames. ], tot_loss[loss=0.1813, beats_loss=0.01735, ecapa_loss=0.001509, whisper_loss=0.1488, over 3729330.14 frames. ], batch size: 68, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:56:27,908 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 12:56:28,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.17 vs. limit=12.75 2024-08-09 12:56:45,100 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=4.0649999999999995 2024-08-09 12:56:59,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=7200.0, ans=0.03666666666666667 2024-08-09 12:57:02,569 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 35 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-09 12:57:08,155 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=12.9 2024-08-09 12:57:11,687 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-09 12:57:23,832 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=6.96 2024-08-09 12:57:29,067 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=10.275 2024-08-09 12:57:35,068 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=10.275 2024-08-09 12:57:40,091 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 750, loss[loss=0.1737, beats_loss=0.01558, ecapa_loss=0.00126, whisper_loss=0.1455, over 23544.00 frames. ], tot_loss[loss=0.1768, beats_loss=0.01703, ecapa_loss=0.001465, whisper_loss=0.1451, over 3768108.00 frames. ], batch size: 91, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:57:55,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=7600.0, ans=0.009217391304347827 2024-08-09 12:57:58,099 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.27 vs. limit=6.9 2024-08-09 12:57:58,221 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=10.35 2024-08-09 12:57:59,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=13.2 2024-08-09 12:58:02,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=7600.0, ans=0.14375 2024-08-09 12:58:03,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=7600.0, ans=0.026250000000000002 2024-08-09 12:58:07,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.48 vs. limit=10.35 2024-08-09 12:58:09,578 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.43 vs. limit=6.925 2024-08-09 12:58:15,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=7700.0, ans=0.13906249999999998 2024-08-09 12:58:34,997 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=13.35 2024-08-09 12:58:39,680 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=13.35 2024-08-09 12:58:42,202 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 12:58:57,142 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.305e+01 2.802e+01 3.610e+01 6.792e+01, threshold=5.604e+01, percent-clipped=3.0 2024-08-09 12:58:57,162 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 800, loss[loss=0.1779, beats_loss=0.01384, ecapa_loss=0.001255, whisper_loss=0.1515, over 14539.00 frames. ], tot_loss[loss=0.1732, beats_loss=0.01672, ecapa_loss=0.00142, whisper_loss=0.1423, over 3795390.04 frames. ], batch size: 55, lr: 4.49e-02, grad_scale: 32.0 2024-08-09 12:59:04,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=8000.0, ans=0.83 2024-08-09 12:59:09,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8000.0, ans=0.22 2024-08-09 12:59:12,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=8100.0, ans=0.125 2024-08-09 12:59:20,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=8100.0, ans=0.125 2024-08-09 12:59:25,330 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.65 vs. limit=9.05 2024-08-09 12:59:34,216 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.27 vs. limit=9.1 2024-08-09 12:59:36,461 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 12:59:45,287 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=10.6125 2024-08-09 12:59:53,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=8300.0, ans=0.125 2024-08-09 13:00:00,674 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-09 13:00:01,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=8400.0, ans=0.125 2024-08-09 13:00:04,241 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.62 vs. limit=7.1 2024-08-09 13:00:10,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=8400.0, ans=0.03166666666666667 2024-08-09 13:00:13,434 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 850, loss[loss=0.1342, beats_loss=0.01484, ecapa_loss=0.0009516, whisper_loss=0.1099, over 18341.00 frames. ], tot_loss[loss=0.1699, beats_loss=0.01642, ecapa_loss=0.00137, whisper_loss=0.1398, over 3818908.88 frames. ], batch size: 66, lr: 4.49e-02, grad_scale: 32.0 2024-08-09 13:00:14,354 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=13.875 2024-08-09 13:00:14,524 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.83 vs. limit=9.25 2024-08-09 13:00:25,406 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-09 13:00:35,640 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 13:00:46,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8700.0, ans=0.213 2024-08-09 13:00:48,201 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-09 13:00:54,596 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-09 13:00:57,468 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-09 13:00:57,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=8800.0, ans=0.125 2024-08-09 13:01:00,132 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-09 13:01:26,632 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.500e+01 2.129e+01 2.561e+01 3.167e+01 6.018e+01, threshold=5.121e+01, percent-clipped=3.0 2024-08-09 13:01:26,653 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 900, loss[loss=0.1403, beats_loss=0.01842, ecapa_loss=0.001078, whisper_loss=0.1111, over 19644.00 frames. ], tot_loss[loss=0.1685, beats_loss=0.01612, ecapa_loss=0.001318, whisper_loss=0.1392, over 3813656.85 frames. ], batch size: 80, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:01:27,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9000.0, ans=0.21000000000000002 2024-08-09 13:01:35,216 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 13:01:48,061 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-09 13:01:50,021 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=10.9125 2024-08-09 13:01:54,227 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.37 vs. limit=7.3 2024-08-09 13:01:59,324 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 13:02:29,792 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 12 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 13:02:30,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9400.0, ans=0.20600000000000002 2024-08-09 13:02:37,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 950, loss[loss=0.1518, beats_loss=0.01449, ecapa_loss=0.001008, whisper_loss=0.1272, over 18537.00 frames. ], tot_loss[loss=0.1639, beats_loss=0.01598, ecapa_loss=0.001272, whisper_loss=0.1352, over 3772959.75 frames. ], batch size: 71, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:02:38,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9500.0, ans=0.20500000000000002 2024-08-09 13:02:38,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=9500.0, ans=0.027083333333333338 2024-08-09 13:02:44,137 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=11.0625 2024-08-09 13:03:04,709 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=11.1 2024-08-09 13:03:21,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=9800.0, ans=0.125 2024-08-09 13:03:38,559 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 13:03:45,716 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=4.485 2024-08-09 13:03:48,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=9900.0, ans=0.125 2024-08-09 13:03:50,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.609e+01 2.154e+01 2.525e+01 3.011e+01 6.635e+01, threshold=5.049e+01, percent-clipped=1.0 2024-08-09 13:03:50,362 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1000, loss[loss=0.167, beats_loss=0.01398, ecapa_loss=0.001104, whisper_loss=0.142, over 15055.00 frames. ], tot_loss[loss=0.161, beats_loss=0.01585, ecapa_loss=0.001219, whisper_loss=0.1329, over 3776569.50 frames. ], batch size: 58, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:04:03,045 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-09 13:04:07,986 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.18 vs. limit=11.2875 2024-08-09 13:04:09,258 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-09 13:04:11,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=10100.0, ans=0.024583333333333336 2024-08-09 13:04:39,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=10300.0, ans=0.125 2024-08-09 13:04:41,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=10300.0, ans=11.3625 2024-08-09 13:04:54,154 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-09 13:04:57,097 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.667e+00 2024-08-09 13:04:58,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=10400.0, ans=10.0 2024-08-09 13:04:58,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=10400.0, ans=0.09899494936611666 2024-08-09 13:05:01,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=10400.0, ans=0.04949747468305833 2024-08-09 13:05:04,057 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1050, loss[loss=0.09086, beats_loss=0.0144, ecapa_loss=0.001082, whisper_loss=0.06564, over 14217.00 frames. ], tot_loss[loss=0.1582, beats_loss=0.01568, ecapa_loss=0.001175, whisper_loss=0.1307, over 3761200.59 frames. ], batch size: 57, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:05:08,628 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 13:05:13,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=10500.0, ans=0.125 2024-08-09 13:05:24,264 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 38 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 13:05:35,885 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-09 13:05:39,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10700.0, ans=0.193 2024-08-09 13:05:42,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=10700.0, ans=0.193 2024-08-09 13:05:45,929 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.80 vs. limit=7.675 2024-08-09 13:05:51,561 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=11.55 2024-08-09 13:05:57,283 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 13:06:04,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=10900.0, ans=0.5185000000000001 2024-08-09 13:06:07,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=10900.0, ans=0.02125 2024-08-09 13:06:11,132 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.06 vs. limit=15.675 2024-08-09 13:06:18,850 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.598e+01 2.283e+01 2.878e+01 3.739e+01 7.694e+01, threshold=5.756e+01, percent-clipped=7.0 2024-08-09 13:06:18,870 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1100, loss[loss=0.1181, beats_loss=0.01261, ecapa_loss=0.001122, whisper_loss=0.09431, over 15469.00 frames. ], tot_loss[loss=0.1573, beats_loss=0.01551, ecapa_loss=0.001135, whisper_loss=0.1304, over 3780889.12 frames. ], batch size: 61, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:06:20,262 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=11.625 2024-08-09 13:06:32,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=11000.0, ans=11.625 2024-08-09 13:06:37,338 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 13:06:39,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=15.825 2024-08-09 13:06:40,237 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 13:06:42,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=11100.0, ans=0.125 2024-08-09 13:06:43,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=11100.0, ans=0.008456521739130436 2024-08-09 13:06:47,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=11200.0, ans=0.020000000000000004 2024-08-09 13:07:09,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=11300.0, ans=0.125 2024-08-09 13:07:11,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=11300.0, ans=0.125 2024-08-09 13:07:15,429 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 13:07:22,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=11400.0, ans=0.0 2024-08-09 13:07:25,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=11400.0, ans=0.01916666666666667 2024-08-09 13:07:31,814 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1150, loss[loss=0.1251, beats_loss=0.01993, ecapa_loss=0.0006973, whisper_loss=0.09818, over 15746.00 frames. ], tot_loss[loss=0.1544, beats_loss=0.01542, ecapa_loss=0.001096, whisper_loss=0.128, over 3789212.31 frames. ], batch size: 61, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:07:35,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=11500.0, ans=0.05 2024-08-09 13:07:42,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=11500.0, ans=0.018750000000000003 2024-08-09 13:07:42,700 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=11.8125 2024-08-09 13:07:49,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.21 vs. limit=10.8 2024-08-09 13:07:51,513 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-09 13:07:52,664 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-09 13:07:57,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=11600.0, ans=0.018333333333333333 2024-08-09 13:07:58,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11600.0, ans=0.184 2024-08-09 13:08:21,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=11800.0, ans=0.125 2024-08-09 13:08:29,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=11900.0, ans=0.008282608695652175 2024-08-09 13:08:30,073 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.66 vs. limit=16.425 2024-08-09 13:08:33,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=11900.0, ans=0.125 2024-08-09 13:08:39,453 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 13:08:41,343 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=11.9625 2024-08-09 13:08:45,095 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+01 2.329e+01 2.685e+01 3.204e+01 5.571e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-09 13:08:45,115 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1200, loss[loss=0.1576, beats_loss=0.01153, ecapa_loss=0.001217, whisper_loss=0.1339, over 15670.00 frames. ], tot_loss[loss=0.1533, beats_loss=0.01526, ecapa_loss=0.001063, whisper_loss=0.1274, over 3775251.45 frames. ], batch size: 67, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:08:52,282 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 13:08:59,468 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 13:09:07,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12100.0, ans=0.179 2024-08-09 13:09:17,552 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-09 13:09:19,694 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.52 vs. limit=16.65 2024-08-09 13:09:57,786 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-09 13:09:58,954 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1250, loss[loss=0.1532, beats_loss=0.01808, ecapa_loss=0.000771, whisper_loss=0.1275, over 16216.00 frames. ], tot_loss[loss=0.1521, beats_loss=0.01495, ecapa_loss=0.001033, whisper_loss=0.1268, over 3767544.86 frames. ], batch size: 64, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:10:11,287 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=12.1875 2024-08-09 13:10:11,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=12.1875 2024-08-09 13:10:13,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=12600.0, ans=0.0 2024-08-09 13:10:14,531 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 13:10:33,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=12700.0, ans=0.013750000000000005 2024-08-09 13:10:38,172 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-09 13:10:44,931 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.57 vs. limit=5.0 2024-08-09 13:11:01,971 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=12.3375 2024-08-09 13:11:07,981 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-09 13:11:12,993 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.663e+01 2.459e+01 3.175e+01 4.087e+01 8.300e+01, threshold=6.351e+01, percent-clipped=6.0 2024-08-09 13:11:13,022 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1300, loss[loss=0.1507, beats_loss=0.01551, ecapa_loss=0.0008896, whisper_loss=0.1263, over 22008.00 frames. ], tot_loss[loss=0.1507, beats_loss=0.0148, ecapa_loss=0.001002, whisper_loss=0.1258, over 3806268.40 frames. ], batch size: 89, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:11:18,587 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.61 vs. limit=8.25 2024-08-09 13:11:25,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=13000.0, ans=0.125 2024-08-09 13:11:31,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=13100.0, ans=0.012083333333333335 2024-08-09 13:11:32,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=13100.0, ans=0.125 2024-08-09 13:11:36,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=13100.0, ans=0.4415 2024-08-09 13:11:59,290 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 17 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 13:11:59,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13300.0, ans=0.16699999999999998 2024-08-09 13:12:10,154 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=17.475 2024-08-09 13:12:23,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13400.0, ans=0.16599999999999998 2024-08-09 13:12:27,000 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1350, loss[loss=0.1323, beats_loss=0.01587, ecapa_loss=0.000838, whisper_loss=0.1081, over 15364.00 frames. ], tot_loss[loss=0.1485, beats_loss=0.01473, ecapa_loss=0.0009709, whisper_loss=0.1241, over 3815338.55 frames. ], batch size: 62, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:12:47,322 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.33 vs. limit=17.7 2024-08-09 13:12:53,956 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 13:12:58,988 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-09 13:13:25,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=13900.0, ans=0.125 2024-08-09 13:13:34,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=13900.0, ans=17.925 2024-08-09 13:13:40,988 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.453e+01 2.894e+01 3.668e+01 7.407e+01, threshold=5.787e+01, percent-clipped=1.0 2024-08-09 13:13:41,008 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1400, loss[loss=0.1503, beats_loss=0.01512, ecapa_loss=0.0007847, whisper_loss=0.1273, over 22512.00 frames. ], tot_loss[loss=0.1479, beats_loss=0.01463, ecapa_loss=0.0009475, whisper_loss=0.1238, over 3842609.44 frames. ], batch size: 88, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:13:59,925 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=12.7875 2024-08-09 13:14:05,485 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 13:14:23,145 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.19 vs. limit=12.1 2024-08-09 13:14:27,368 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=12.8625 2024-08-09 13:14:31,423 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 14 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 13:14:43,951 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=12.9 2024-08-09 13:14:46,486 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 13:14:47,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=14400.0, ans=0.0077391304347826095 2024-08-09 13:14:56,366 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1450, loss[loss=0.1689, beats_loss=0.01281, ecapa_loss=0.0008693, whisper_loss=0.1474, over 18739.00 frames. ], tot_loss[loss=0.1458, beats_loss=0.01455, ecapa_loss=0.0009237, whisper_loss=0.122, over 3846357.56 frames. ], batch size: 73, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:15:17,977 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=12.9375 2024-08-09 13:15:21,438 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.52 vs. limit=18.375 2024-08-09 13:15:48,043 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.13 vs. limit=12.35 2024-08-09 13:15:49,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=14700.0, ans=0.125 2024-08-09 13:16:05,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=14800.0, ans=0.125 2024-08-09 13:16:18,559 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=13.0875 2024-08-09 13:16:25,726 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-09 13:16:27,651 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=13.0875 2024-08-09 13:16:31,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.675e+01 2.402e+01 3.110e+01 4.073e+01 8.821e+01, threshold=6.219e+01, percent-clipped=9.0 2024-08-09 13:16:31,866 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1500, loss[loss=0.159, beats_loss=0.01244, ecapa_loss=0.0009345, whisper_loss=0.1372, over 17220.00 frames. ], tot_loss[loss=0.1446, beats_loss=0.01449, ecapa_loss=0.0008986, whisper_loss=0.1212, over 3837966.87 frames. ], batch size: 70, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:16:41,386 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 15 from LS+wenet, 26 from Vox, 49 fro AS 2024-08-09 13:16:49,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=15100.0, ans=0.125 2024-08-09 13:17:34,704 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 13:17:44,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=15400.0, ans=0.0 2024-08-09 13:17:50,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=15400.0, ans=0.125 2024-08-09 13:17:52,918 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1550, loss[loss=0.1656, beats_loss=0.01139, ecapa_loss=0.000872, whisper_loss=0.1455, over 23105.00 frames. ], tot_loss[loss=0.1444, beats_loss=0.01443, ecapa_loss=0.000872, whisper_loss=0.1212, over 3834966.00 frames. ], batch size: 93, lr: 4.45e-02, grad_scale: 32.0 2024-08-09 13:17:53,677 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.62 vs. limit=13.3125 2024-08-09 13:17:58,575 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.85 vs. limit=8.875 2024-08-09 13:18:15,250 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 13:18:19,297 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=13.35 2024-08-09 13:18:38,323 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-09 13:18:44,234 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 16 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-09 13:18:44,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=15800.0, ans=0.0008333333333333387 2024-08-09 13:18:56,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=15900.0, ans=0.00041666666666666935 2024-08-09 13:19:03,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=15900.0, ans=0.125 2024-08-09 13:19:06,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=15900.0, ans=0.125 2024-08-09 13:19:12,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.449e+01 2.841e+01 3.798e+01 6.790e+01, threshold=5.683e+01, percent-clipped=3.0 2024-08-09 13:19:12,777 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1600, loss[loss=0.1473, beats_loss=0.01458, ecapa_loss=0.0007661, whisper_loss=0.125, over 16101.00 frames. ], tot_loss[loss=0.1433, beats_loss=0.01429, ecapa_loss=0.0008522, whisper_loss=0.1205, over 3820096.57 frames. ], batch size: 63, lr: 4.45e-02, grad_scale: 32.0 2024-08-09 13:19:18,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=16000.0, ans=0.07 2024-08-09 13:19:20,797 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-09 13:19:33,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=16100.0, ans=0.125 2024-08-09 13:19:55,953 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=13.575 2024-08-09 13:20:11,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=16300.0, ans=0.125 2024-08-09 13:20:33,537 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1650, loss[loss=0.1407, beats_loss=0.01348, ecapa_loss=0.0007381, whisper_loss=0.1199, over 13287.00 frames. ], tot_loss[loss=0.1433, beats_loss=0.01416, ecapa_loss=0.0008308, whisper_loss=0.1208, over 3832557.01 frames. ], batch size: 55, lr: 4.45e-02, grad_scale: 32.0 2024-08-09 13:21:02,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16600.0, ans=0.134 2024-08-09 13:21:06,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=16700.0, ans=0.125 2024-08-09 13:21:36,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16800.0, ans=0.132 2024-08-09 13:21:45,957 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 13:21:50,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=16900.0, ans=0.125 2024-08-09 13:21:50,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=16900.0, ans=0.04949747468305833 2024-08-09 13:21:54,903 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.579e+01 3.058e+01 4.131e+01 8.941e+01, threshold=6.115e+01, percent-clipped=7.0 2024-08-09 13:21:54,927 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1700, loss[loss=0.1414, beats_loss=0.01097, ecapa_loss=0.0008166, whisper_loss=0.1223, over 17219.00 frames. ], tot_loss[loss=0.1428, beats_loss=0.01415, ecapa_loss=0.0008117, whisper_loss=0.1205, over 3854682.06 frames. ], batch size: 67, lr: 4.44e-02, grad_scale: 32.0 2024-08-09 13:21:55,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.09 vs. limit=13.5 2024-08-09 13:21:57,368 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.15 vs. limit=20.25 2024-08-09 13:21:58,100 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 13:22:15,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=17100.0, ans=0.007152173913043479 2024-08-09 13:22:18,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=17100.0, ans=0.3015 2024-08-09 13:22:20,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=17100.0, ans=0.007152173913043479 2024-08-09 13:22:23,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=17100.0, ans=0.007152173913043479 2024-08-09 13:22:34,855 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 13:22:50,452 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 12 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 13:23:07,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=17400.0, ans=0.0 2024-08-09 13:23:11,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=17400.0, ans=0.0 2024-08-09 13:23:13,123 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1750, loss[loss=0.1368, beats_loss=0.01256, ecapa_loss=0.0007211, whisper_loss=0.117, over 18232.00 frames. ], tot_loss[loss=0.1414, beats_loss=0.01415, ecapa_loss=0.000794, whisper_loss=0.1193, over 3846395.48 frames. ], batch size: 69, lr: 4.44e-02, grad_scale: 32.0 2024-08-09 13:23:13,315 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 13:23:15,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=17500.0, ans=0.125 2024-08-09 13:23:18,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.94 vs. limit=14.0625 2024-08-09 13:23:20,278 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.43 vs. limit=9.375 2024-08-09 13:23:38,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=17600.0, ans=0.124 2024-08-09 13:23:38,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=17600.0, ans=0.28400000000000003 2024-08-09 13:23:43,457 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 10 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-09 13:24:17,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=17900.0, ans=0.125 2024-08-09 13:24:19,783 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 13:24:27,929 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.728e+01 3.350e+01 4.234e+01 7.677e+01, threshold=6.699e+01, percent-clipped=2.0 2024-08-09 13:24:27,954 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1800, loss[loss=0.146, beats_loss=0.01096, ecapa_loss=0.0007479, whisper_loss=0.1275, over 19446.00 frames. ], tot_loss[loss=0.1408, beats_loss=0.01408, ecapa_loss=0.0007753, whisper_loss=0.119, over 3861507.28 frames. ], batch size: 71, lr: 4.44e-02, grad_scale: 32.0 2024-08-09 13:24:37,440 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=14.25 2024-08-09 13:24:51,515 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 13:25:00,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=18200.0, ans=0.125 2024-08-09 13:25:07,440 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-09 13:25:08,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=18200.0, ans=0.09899494936611666 2024-08-09 13:25:21,626 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.85 vs. limit=21.225 2024-08-09 13:25:43,244 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1850, loss[loss=0.1341, beats_loss=0.01453, ecapa_loss=0.0006625, whisper_loss=0.113, over 15596.00 frames. ], tot_loss[loss=0.1404, beats_loss=0.01406, ecapa_loss=0.0007659, whisper_loss=0.1187, over 3847362.09 frames. ], batch size: 57, lr: 4.43e-02, grad_scale: 32.0 2024-08-09 13:25:43,455 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-09 13:25:58,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.16 vs. limit=14.3 2024-08-09 13:26:00,263 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.15 vs. limit=21.45 2024-08-09 13:26:00,367 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=5.79 2024-08-09 13:26:21,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=18700.0, ans=0.125 2024-08-09 13:26:25,417 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-09 13:26:55,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=18900.0, ans=0.035 2024-08-09 13:26:57,738 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=14.5875 2024-08-09 13:27:01,745 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 13:27:03,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.592e+01 3.002e+01 4.008e+01 1.371e+02, threshold=6.005e+01, percent-clipped=3.0 2024-08-09 13:27:03,073 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1900, loss[loss=0.127, beats_loss=0.01535, ecapa_loss=0.0007076, whisper_loss=0.1046, over 20158.00 frames. ], tot_loss[loss=0.1396, beats_loss=0.0141, ecapa_loss=0.0007654, whisper_loss=0.1179, over 3815087.05 frames. ], batch size: 84, lr: 4.43e-02, grad_scale: 32.0 2024-08-09 13:27:19,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19100.0, ans=0.10900000000000001 2024-08-09 13:27:31,584 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 13:27:39,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=19200.0, ans=0.22799999999999998 2024-08-09 13:27:42,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.47 vs. limit=11.68 2024-08-09 13:27:48,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=19300.0, ans=0.125 2024-08-09 13:27:50,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=19300.0, ans=0.025 2024-08-09 13:28:20,357 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 1950, loss[loss=0.1121, beats_loss=0.01703, ecapa_loss=0.0006583, whisper_loss=0.08845, over 21296.00 frames. ], tot_loss[loss=0.1383, beats_loss=0.01402, ecapa_loss=0.0007682, whisper_loss=0.1166, over 3859589.47 frames. ], batch size: 89, lr: 4.43e-02, grad_scale: 32.0 2024-08-09 13:28:20,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=19500.0, ans=0.21750000000000003 2024-08-09 13:28:38,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=19600.0, ans=0.125 2024-08-09 13:28:47,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=19600.0, ans=0.0 2024-08-09 13:28:52,391 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=14.8875 2024-08-09 13:28:55,673 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=14.8875 2024-08-09 13:28:57,465 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 13:29:07,112 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-09 13:29:07,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19800.0, ans=0.10200000000000001 2024-08-09 13:29:18,507 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 13:29:34,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=20000.0, ans=0.07 2024-08-09 13:29:35,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.629e+01 3.262e+01 3.981e+01 7.661e+01, threshold=6.525e+01, percent-clipped=2.0 2024-08-09 13:29:35,117 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2000, loss[loss=0.1218, beats_loss=0.01717, ecapa_loss=0.0007873, whisper_loss=0.09672, over 21498.00 frames. ], tot_loss[loss=0.1378, beats_loss=0.01406, ecapa_loss=0.0007625, whisper_loss=0.1161, over 3852033.19 frames. ], batch size: 94, lr: 4.42e-02, grad_scale: 64.0 2024-08-09 13:29:35,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=20000.0, ans=0.2 2024-08-09 13:29:35,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=20000.0, ans=0.125 2024-08-09 13:29:38,159 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-09 13:29:38,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=20000.0, ans=0.0 2024-08-09 13:29:44,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=20000.0, ans=0.5 2024-08-09 13:29:50,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=20100.0, ans=0.07 2024-08-09 13:29:51,930 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 13:29:53,432 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 25 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-09 13:30:02,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=20100.0, ans=0.04949747468305833 2024-08-09 13:30:06,257 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=19.11 vs. limit=15.0 2024-08-09 13:30:26,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=20300.0, ans=0.125 2024-08-09 13:30:41,974 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-09 13:30:54,984 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2050, loss[loss=0.1576, beats_loss=0.01136, ecapa_loss=0.0008007, whisper_loss=0.1382, over 15682.00 frames. ], tot_loss[loss=0.1379, beats_loss=0.014, ecapa_loss=0.0007578, whisper_loss=0.1164, over 3831541.64 frames. ], batch size: 62, lr: 4.42e-02, grad_scale: 64.0 2024-08-09 13:31:10,539 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-09 13:31:23,364 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 13:31:29,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20700.0, ans=0.1 2024-08-09 13:31:34,203 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-09 13:31:52,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=20900.0, ans=0.2 2024-08-09 13:31:54,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=20900.0, ans=0.125 2024-08-09 13:32:07,010 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 36 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 13:32:08,653 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.818e+01 3.204e+01 4.044e+01 7.345e+01, threshold=6.407e+01, percent-clipped=1.0 2024-08-09 13:32:08,680 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2100, loss[loss=0.1585, beats_loss=0.01402, ecapa_loss=0.0007252, whisper_loss=0.1372, over 23588.00 frames. ], tot_loss[loss=0.1369, beats_loss=0.0141, ecapa_loss=0.0007486, whisper_loss=0.1153, over 3836642.42 frames. ], batch size: 94, lr: 4.42e-02, grad_scale: 64.0 2024-08-09 13:32:24,244 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.54 vs. limit=22.5 2024-08-09 13:32:31,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21100.0, ans=0.1 2024-08-09 13:32:39,799 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 13:33:10,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=21400.0, ans=0.125 2024-08-09 13:33:13,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=21400.0, ans=0.125 2024-08-09 13:33:19,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=21400.0, ans=0.125 2024-08-09 13:33:25,534 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2150, loss[loss=0.1299, beats_loss=0.0152, ecapa_loss=0.00071, whisper_loss=0.1076, over 20973.00 frames. ], tot_loss[loss=0.1366, beats_loss=0.01406, ecapa_loss=0.0007347, whisper_loss=0.1152, over 3837442.98 frames. ], batch size: 88, lr: 4.41e-02, grad_scale: 64.0 2024-08-09 13:33:32,512 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 13:33:55,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=21700.0, ans=0.125 2024-08-09 13:34:00,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=21700.0, ans=0.125 2024-08-09 13:34:12,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=21800.0, ans=0.125 2024-08-09 13:34:25,521 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-09 13:34:27,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=21900.0, ans=0.0 2024-08-09 13:34:28,002 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=15.0 2024-08-09 13:34:38,092 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-09 13:34:38,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=21900.0, ans=0.125 2024-08-09 13:34:38,723 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.96 vs. limit=15.0 2024-08-09 13:34:39,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=21900.0, ans=0.05 2024-08-09 13:34:42,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.673e+01 3.209e+01 4.237e+01 7.311e+01, threshold=6.417e+01, percent-clipped=1.0 2024-08-09 13:34:42,189 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2200, loss[loss=0.1444, beats_loss=0.01306, ecapa_loss=0.0008478, whisper_loss=0.1229, over 17123.00 frames. ], tot_loss[loss=0.136, beats_loss=0.014, ecapa_loss=0.0007316, whisper_loss=0.1147, over 3823063.27 frames. ], batch size: 72, lr: 4.41e-02, grad_scale: 64.0 2024-08-09 13:35:05,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=22100.0, ans=0.006065217391304348 2024-08-09 13:35:09,247 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 26 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-09 13:35:20,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=22200.0, ans=0.0 2024-08-09 13:35:33,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=22300.0, ans=0.0 2024-08-09 13:35:42,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=22300.0, ans=0.006021739130434783 2024-08-09 13:35:43,523 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 13 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 13:35:47,969 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.54 vs. limit=10.0 2024-08-09 13:35:50,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=22400.0, ans=0.125 2024-08-09 13:36:01,359 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2250, loss[loss=0.1477, beats_loss=0.01476, ecapa_loss=0.0007133, whisper_loss=0.1258, over 21871.00 frames. ], tot_loss[loss=0.1358, beats_loss=0.01401, ecapa_loss=0.0007272, whisper_loss=0.1145, over 3829581.21 frames. ], batch size: 88, lr: 4.40e-02, grad_scale: 64.0 2024-08-09 13:36:13,878 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.23 vs. limit=10.0 2024-08-09 13:36:16,890 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 27 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 13:36:52,422 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-09 13:37:10,154 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 13 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-09 13:37:12,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=22800.0, ans=0.00591304347826087 2024-08-09 13:37:32,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=22900.0, ans=0.125 2024-08-09 13:37:43,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.94 vs. limit=22.5 2024-08-09 13:37:45,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.951e+01 3.575e+01 4.087e+01 9.473e+01, threshold=7.150e+01, percent-clipped=2.0 2024-08-09 13:37:45,983 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2300, loss[loss=0.1341, beats_loss=0.01716, ecapa_loss=0.0004772, whisper_loss=0.1122, over 22022.00 frames. ], tot_loss[loss=0.136, beats_loss=0.01399, ecapa_loss=0.0007154, whisper_loss=0.1149, over 3860886.52 frames. ], batch size: 84, lr: 4.40e-02, grad_scale: 64.0 2024-08-09 13:38:09,783 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.41 vs. limit=22.5 2024-08-09 13:38:13,708 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 13 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 13:38:16,967 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-09 13:38:22,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=23200.0, ans=0.0 2024-08-09 13:38:28,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23200.0, ans=0.1 2024-08-09 13:38:29,867 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 13:38:30,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=23200.0, ans=0.2 2024-08-09 13:38:31,189 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-09 13:38:39,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=23300.0, ans=0.125 2024-08-09 13:38:47,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=23300.0, ans=0.0 2024-08-09 13:38:48,806 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 36 from Vox, 27 fro AS 2024-08-09 13:38:49,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=23400.0, ans=0.2 2024-08-09 13:38:50,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23400.0, ans=0.1 2024-08-09 13:39:04,915 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2350, loss[loss=0.1169, beats_loss=0.01595, ecapa_loss=0.0007125, whisper_loss=0.09384, over 21104.00 frames. ], tot_loss[loss=0.1353, beats_loss=0.01396, ecapa_loss=0.0007106, whisper_loss=0.1142, over 3858991.38 frames. ], batch size: 88, lr: 4.40e-02, grad_scale: 64.0 2024-08-09 13:39:11,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=23500.0, ans=0.0 2024-08-09 13:39:22,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=23600.0, ans=0.005739130434782609 2024-08-09 13:39:23,499 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 13:39:25,635 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2024-08-09 13:39:33,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=23600.0, ans=0.125 2024-08-09 13:39:35,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=23700.0, ans=0.1 2024-08-09 13:39:42,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=23700.0, ans=0.0 2024-08-09 13:39:50,109 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.32 vs. limit=15.0 2024-08-09 13:40:01,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=23800.0, ans=0.125 2024-08-09 13:40:18,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=23900.0, ans=0.005673913043478261 2024-08-09 13:40:24,000 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.811e+01 3.461e+01 4.504e+01 7.215e+01, threshold=6.923e+01, percent-clipped=1.0 2024-08-09 13:40:24,020 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2400, loss[loss=0.1189, beats_loss=0.009557, ecapa_loss=0.000768, whisper_loss=0.1016, over 16553.00 frames. ], tot_loss[loss=0.1349, beats_loss=0.01383, ecapa_loss=0.0006983, whisper_loss=0.1141, over 3855026.07 frames. ], batch size: 66, lr: 4.39e-02, grad_scale: 64.0 2024-08-09 13:40:27,127 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-09 13:40:36,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=24000.0, ans=0.1 2024-08-09 13:40:56,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=24200.0, ans=0.2 2024-08-09 13:41:10,056 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2024-08-09 13:41:18,152 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-09 13:41:27,372 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 25 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-09 13:41:37,630 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=15.0 2024-08-09 13:41:39,298 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2450, loss[loss=0.1181, beats_loss=0.01474, ecapa_loss=0.0007234, whisper_loss=0.09615, over 16717.00 frames. ], tot_loss[loss=0.1345, beats_loss=0.01379, ecapa_loss=0.0006881, whisper_loss=0.1139, over 3856119.78 frames. ], batch size: 68, lr: 4.39e-02, grad_scale: 64.0 2024-08-09 13:41:44,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=24500.0, ans=0.125 2024-08-09 13:41:46,988 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 13:41:48,552 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 13:41:57,813 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.15 vs. limit=15.0 2024-08-09 13:42:04,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=24600.0, ans=0.1 2024-08-09 13:42:11,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=24700.0, ans=0.2 2024-08-09 13:42:11,922 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2024-08-09 13:42:19,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=24700.0, ans=0.125 2024-08-09 13:42:35,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.31 vs. limit=22.5 2024-08-09 13:42:36,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=24900.0, ans=0.125 2024-08-09 13:42:36,923 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.08 vs. limit=15.0 2024-08-09 13:42:45,339 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-09 13:42:50,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=24900.0, ans=0.0 2024-08-09 13:42:52,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.830e+01 3.469e+01 4.522e+01 1.002e+02, threshold=6.939e+01, percent-clipped=2.0 2024-08-09 13:42:52,632 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2500, loss[loss=0.1449, beats_loss=0.01414, ecapa_loss=0.0005923, whisper_loss=0.1248, over 14724.00 frames. ], tot_loss[loss=0.1336, beats_loss=0.01379, ecapa_loss=0.0006798, whisper_loss=0.113, over 3813153.42 frames. ], batch size: 60, lr: 4.38e-02, grad_scale: 64.0 2024-08-09 13:43:13,474 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 13:43:15,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=25100.0, ans=0.0054130434782608695 2024-08-09 13:43:16,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=25100.0, ans=0.2 2024-08-09 13:43:19,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=25100.0, ans=0.125 2024-08-09 13:43:32,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=25200.0, ans=0.125 2024-08-09 13:43:35,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=25200.0, ans=0.0 2024-08-09 13:43:42,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=25300.0, ans=0.125 2024-08-09 13:43:43,284 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.14 vs. limit=6.0 2024-08-09 13:43:43,515 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.67 vs. limit=22.5 2024-08-09 13:43:45,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=25300.0, ans=0.125 2024-08-09 13:43:48,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=25300.0, ans=0.04949747468305833 2024-08-09 13:43:53,443 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 13:44:08,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2550, loss[loss=0.12, beats_loss=0.01429, ecapa_loss=0.0007526, whisper_loss=0.09823, over 17732.00 frames. ], tot_loss[loss=0.1338, beats_loss=0.01386, ecapa_loss=0.0006683, whisper_loss=0.1132, over 3851213.46 frames. ], batch size: 75, lr: 4.38e-02, grad_scale: 64.0 2024-08-09 13:44:16,575 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 14 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-09 13:44:18,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=25500.0, ans=0.125 2024-08-09 13:44:19,786 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 21 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-09 13:44:25,327 INFO [train_multi_KD3.py:844] (1/4) A total of 98 cuts. 27 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-09 13:44:27,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25600.0, ans=0.1 2024-08-09 13:44:31,718 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-09 13:44:32,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=25600.0, ans=0.005304347826086957 2024-08-09 13:44:46,747 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-09 13:45:09,186 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-09 13:45:11,846 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-09 13:45:12,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=25900.0, ans=0.125 2024-08-09 13:45:21,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.287e+01 3.019e+01 3.579e+01 4.793e+01 1.038e+02, threshold=7.158e+01, percent-clipped=5.0 2024-08-09 13:45:21,493 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2600, loss[loss=0.09832, beats_loss=0.01666, ecapa_loss=0.0006054, whisper_loss=0.07561, over 17808.00 frames. ], tot_loss[loss=0.1333, beats_loss=0.01395, ecapa_loss=0.0006589, whisper_loss=0.1128, over 3853274.52 frames. ], batch size: 75, lr: 4.37e-02, grad_scale: 64.0 2024-08-09 13:45:23,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=26000.0, ans=0.125 2024-08-09 13:45:30,007 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 13:45:36,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=26100.0, ans=0.005195652173913044 2024-08-09 13:45:40,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=26100.0, ans=0.0 2024-08-09 13:45:46,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=26100.0, ans=0.2 2024-08-09 13:46:20,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=26400.0, ans=0.0 2024-08-09 13:46:32,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=26400.0, ans=0.125 2024-08-09 13:46:34,515 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2650, loss[loss=0.1052, beats_loss=0.01643, ecapa_loss=0.0006349, whisper_loss=0.08247, over 21852.00 frames. ], tot_loss[loss=0.1324, beats_loss=0.01398, ecapa_loss=0.0006499, whisper_loss=0.112, over 3864470.76 frames. ], batch size: 92, lr: 4.37e-02, grad_scale: 64.0 2024-08-09 13:46:37,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=26500.0, ans=10.0 2024-08-09 13:46:45,750 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.54 vs. limit=22.5 2024-08-09 13:46:51,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=26600.0, ans=0.0 2024-08-09 13:47:07,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=26700.0, ans=0.0 2024-08-09 13:47:09,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=26700.0, ans=0.0 2024-08-09 13:47:38,672 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 13:47:44,871 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 26 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-09 13:47:47,404 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.863e+01 3.296e+01 3.949e+01 7.406e+01, threshold=6.593e+01, percent-clipped=2.0 2024-08-09 13:47:47,430 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2700, loss[loss=0.1279, beats_loss=0.0183, ecapa_loss=0.00053, whisper_loss=0.1043, over 23224.00 frames. ], tot_loss[loss=0.1314, beats_loss=0.014, ecapa_loss=0.0006498, whisper_loss=0.1109, over 3851849.00 frames. ], batch size: 92, lr: 4.36e-02, grad_scale: 64.0 2024-08-09 13:47:55,428 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2024-08-09 13:48:17,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=15.0 2024-08-09 13:48:24,151 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 13:48:30,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=27300.0, ans=0.125 2024-08-09 13:48:43,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=27300.0, ans=0.95 2024-08-09 13:48:53,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=27400.0, ans=0.125 2024-08-09 13:48:57,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=27400.0, ans=0.125 2024-08-09 13:49:01,141 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2750, loss[loss=0.1089, beats_loss=0.0134, ecapa_loss=0.000713, whisper_loss=0.08841, over 18217.00 frames. ], tot_loss[loss=0.1316, beats_loss=0.01395, ecapa_loss=0.0006459, whisper_loss=0.1112, over 3826615.82 frames. ], batch size: 76, lr: 4.36e-02, grad_scale: 64.0 2024-08-09 13:49:02,818 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 13 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 13:49:14,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=27500.0, ans=0.0 2024-08-09 13:49:16,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=27600.0, ans=0.125 2024-08-09 13:49:17,825 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 13:49:19,676 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=12.0 2024-08-09 13:49:28,933 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=12.0 2024-08-09 13:49:30,329 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-08-09 13:49:32,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=27700.0, ans=0.125 2024-08-09 13:49:39,377 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 13:50:02,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=27900.0, ans=0.125 2024-08-09 13:50:02,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=27900.0, ans=0.5 2024-08-09 13:50:03,925 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-09 13:50:05,825 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=12.0 2024-08-09 13:50:07,533 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-08-09 13:50:18,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=28000.0, ans=0.2 2024-08-09 13:50:19,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.874e+01 3.420e+01 4.195e+01 6.815e+01, threshold=6.839e+01, percent-clipped=2.0 2024-08-09 13:50:19,620 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2800, loss[loss=0.1611, beats_loss=0.01158, ecapa_loss=0.0007277, whisper_loss=0.1422, over 21386.00 frames. ], tot_loss[loss=0.132, beats_loss=0.0139, ecapa_loss=0.0006426, whisper_loss=0.1117, over 3845377.49 frames. ], batch size: 85, lr: 4.36e-02, grad_scale: 64.0 2024-08-09 13:50:19,827 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 13:50:29,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=28000.0, ans=0.0 2024-08-09 13:50:33,366 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=15.0 2024-08-09 13:50:48,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=28100.0, ans=0.125 2024-08-09 13:51:10,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=28300.0, ans=0.125 2024-08-09 13:51:12,003 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-09 13:51:16,674 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 13:51:18,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=28300.0, ans=0.125 2024-08-09 13:51:26,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28400.0, ans=0.1 2024-08-09 13:51:27,162 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 13:51:32,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=28400.0, ans=0.125 2024-08-09 13:51:36,405 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 13:51:38,505 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2850, loss[loss=0.12, beats_loss=0.01227, ecapa_loss=0.0006793, whisper_loss=0.101, over 13389.00 frames. ], tot_loss[loss=0.1321, beats_loss=0.0139, ecapa_loss=0.0006391, whisper_loss=0.1118, over 3885836.44 frames. ], batch size: 58, lr: 4.35e-02, grad_scale: 64.0 2024-08-09 13:51:38,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=28500.0, ans=0.0 2024-08-09 13:51:40,661 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.64 vs. limit=22.5 2024-08-09 13:51:51,378 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 13:51:51,637 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 13:52:20,926 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 13:53:00,763 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 3.002e+01 3.706e+01 4.572e+01 7.980e+01, threshold=7.411e+01, percent-clipped=5.0 2024-08-09 13:53:00,784 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2900, loss[loss=0.1489, beats_loss=0.01324, ecapa_loss=0.0007173, whisper_loss=0.1285, over 16215.00 frames. ], tot_loss[loss=0.1322, beats_loss=0.01395, ecapa_loss=0.0006358, whisper_loss=0.1119, over 3877940.01 frames. ], batch size: 68, lr: 4.35e-02, grad_scale: 64.0 2024-08-09 13:53:09,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=29000.0, ans=0.0 2024-08-09 13:53:15,866 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.04 vs. limit=15.0 2024-08-09 13:53:20,219 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.94 vs. limit=22.5 2024-08-09 13:53:21,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=29100.0, ans=0.1 2024-08-09 13:53:50,042 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 38 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-09 13:53:51,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=29300.0, ans=0.125 2024-08-09 13:53:56,583 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2024-08-09 13:54:10,356 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 13:54:19,854 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 2950, loss[loss=0.1173, beats_loss=0.01468, ecapa_loss=0.0005532, whisper_loss=0.09706, over 14681.00 frames. ], tot_loss[loss=0.1322, beats_loss=0.01391, ecapa_loss=0.0006294, whisper_loss=0.112, over 3834923.11 frames. ], batch size: 56, lr: 4.34e-02, grad_scale: 64.0 2024-08-09 13:54:34,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=29600.0, ans=0.125 2024-08-09 13:54:57,901 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 13:55:21,681 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-09 13:55:32,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=29900.0, ans=0.1 2024-08-09 13:55:35,489 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-09 13:55:35,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=29900.0, ans=0.5 2024-08-09 13:55:39,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 3.111e+01 3.701e+01 4.234e+01 7.297e+01, threshold=7.402e+01, percent-clipped=0.0 2024-08-09 13:55:39,270 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3000, loss[loss=0.1467, beats_loss=0.01086, ecapa_loss=0.0007268, whisper_loss=0.1286, over 13038.00 frames. ], tot_loss[loss=0.1321, beats_loss=0.01383, ecapa_loss=0.0006237, whisper_loss=0.112, over 3860038.17 frames. ], batch size: 53, lr: 4.34e-02, grad_scale: 64.0 2024-08-09 13:55:39,271 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-09 13:56:23,658 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on ASR_libri: loss=0.3107, beats_loss=0, ecapa_loss=0.001585, whisper_loss=0.2948, over 922467.00 frames. 2024-08-09 13:56:41,523 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on SV_voxceleb1: loss=0.0159, beats_loss=0, ecapa_loss=0.00159, whisper_loss=0, over 939242.00 frames. 2024-08-09 13:58:39,709 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on AT_audioset: loss=0.03327, beats_loss=0.03327, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 13:58:39,713 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-09 13:58:49,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=30000.0, ans=0.004347826086956522 2024-08-09 13:58:53,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=30000.0, ans=0.125 2024-08-09 13:58:56,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=30100.0, ans=0.0 2024-08-09 13:59:02,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=30100.0, ans=0.07 2024-08-09 13:59:04,069 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 13:59:09,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=30100.0, ans=0.125 2024-08-09 13:59:18,706 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 13:59:29,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=30300.0, ans=0.125 2024-08-09 13:59:36,847 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 38 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 13:59:38,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=30300.0, ans=0.125 2024-08-09 13:59:41,608 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-09 13:59:46,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=30400.0, ans=15.0 2024-08-09 13:59:56,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=30400.0, ans=0.05 2024-08-09 14:00:02,773 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 19 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-09 14:00:02,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=30500.0, ans=0.035 2024-08-09 14:00:04,775 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3050, loss[loss=0.1042, beats_loss=0.01634, ecapa_loss=0.0005539, whisper_loss=0.08231, over 20961.00 frames. ], tot_loss[loss=0.1325, beats_loss=0.01387, ecapa_loss=0.0006166, whisper_loss=0.1125, over 3884070.48 frames. ], batch size: 87, lr: 4.33e-02, grad_scale: 64.0 2024-08-09 14:00:04,911 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 14:00:15,136 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.94 vs. limit=15.0 2024-08-09 14:00:24,649 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=15.0 2024-08-09 14:00:45,540 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 14:00:51,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=30800.0, ans=0.2 2024-08-09 14:01:02,408 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 14:01:16,429 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-08-09 14:01:18,101 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 3.101e+01 3.734e+01 4.761e+01 9.232e+01, threshold=7.468e+01, percent-clipped=3.0 2024-08-09 14:01:18,121 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3100, loss[loss=0.1178, beats_loss=0.01355, ecapa_loss=0.0005201, whisper_loss=0.09901, over 14205.00 frames. ], tot_loss[loss=0.1325, beats_loss=0.01379, ecapa_loss=0.0006149, whisper_loss=0.1125, over 3892609.63 frames. ], batch size: 55, lr: 4.33e-02, grad_scale: 64.0 2024-08-09 14:01:18,849 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.22 vs. limit=15.0 2024-08-09 14:01:18,977 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2024-08-09 14:01:21,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=31000.0, ans=0.004130434782608696 2024-08-09 14:01:22,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=31000.0, ans=0.0 2024-08-09 14:01:24,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=31000.0, ans=0.125 2024-08-09 14:01:31,945 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.71 vs. limit=15.0 2024-08-09 14:01:34,997 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=12.0 2024-08-09 14:01:45,753 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 17 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 14:01:47,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=31200.0, ans=0.2 2024-08-09 14:01:52,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=31200.0, ans=0.2 2024-08-09 14:01:55,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=31200.0, ans=0.125 2024-08-09 14:02:00,126 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 38 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 14:02:13,890 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2024-08-09 14:02:20,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=31400.0, ans=0.125 2024-08-09 14:02:23,757 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3150, loss[loss=0.1204, beats_loss=0.01332, ecapa_loss=0.0006998, whisper_loss=0.1001, over 21323.00 frames. ], tot_loss[loss=0.1321, beats_loss=0.01372, ecapa_loss=0.0006095, whisper_loss=0.1123, over 3903538.13 frames. ], batch size: 90, lr: 4.32e-02, grad_scale: 64.0 2024-08-09 14:02:25,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=31500.0, ans=0.1 2024-08-09 14:02:25,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=31500.0, ans=0.0 2024-08-09 14:02:29,805 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2024-08-09 14:02:40,136 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 14:02:43,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=31600.0, ans=0.0 2024-08-09 14:02:44,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=31600.0, ans=0.125 2024-08-09 14:02:49,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=31700.0, ans=0.125 2024-08-09 14:02:57,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=31700.0, ans=0.125 2024-08-09 14:03:08,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=31800.0, ans=0.0 2024-08-09 14:03:19,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=31900.0, ans=0.125 2024-08-09 14:03:30,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 3.005e+01 3.440e+01 4.161e+01 7.835e+01, threshold=6.880e+01, percent-clipped=1.0 2024-08-09 14:03:30,643 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3200, loss[loss=0.1518, beats_loss=0.01228, ecapa_loss=0.0004973, whisper_loss=0.1346, over 22128.00 frames. ], tot_loss[loss=0.1323, beats_loss=0.01364, ecapa_loss=0.0006042, whisper_loss=0.1126, over 3902246.79 frames. ], batch size: 80, lr: 4.32e-02, grad_scale: 64.0 2024-08-09 14:03:33,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=32000.0, ans=0.125 2024-08-09 14:03:56,761 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.26 vs. limit=22.5 2024-08-09 14:04:11,604 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 14:04:16,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=32300.0, ans=0.125 2024-08-09 14:04:18,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=32300.0, ans=0.1 2024-08-09 14:04:36,270 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3250, loss[loss=0.1142, beats_loss=0.01481, ecapa_loss=0.0005666, whisper_loss=0.09376, over 19617.00 frames. ], tot_loss[loss=0.1316, beats_loss=0.01369, ecapa_loss=0.0005999, whisper_loss=0.1119, over 3885618.24 frames. ], batch size: 78, lr: 4.31e-02, grad_scale: 64.0 2024-08-09 14:04:38,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=32500.0, ans=0.0 2024-08-09 14:05:02,848 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 14:05:04,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=32700.0, ans=0.0 2024-08-09 14:05:14,561 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-09 14:05:16,043 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-09 14:05:26,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=32800.0, ans=0.125 2024-08-09 14:05:33,062 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 14:05:34,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=32900.0, ans=0.125 2024-08-09 14:05:42,301 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 3.060e+01 3.523e+01 4.253e+01 9.588e+01, threshold=7.047e+01, percent-clipped=8.0 2024-08-09 14:05:42,322 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3300, loss[loss=0.1543, beats_loss=0.01152, ecapa_loss=0.000514, whisper_loss=0.1376, over 18209.00 frames. ], tot_loss[loss=0.1308, beats_loss=0.01373, ecapa_loss=0.0005961, whisper_loss=0.1111, over 3880574.32 frames. ], batch size: 68, lr: 4.31e-02, grad_scale: 64.0 2024-08-09 14:05:45,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=33000.0, ans=0.0 2024-08-09 14:05:52,239 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2024-08-09 14:06:06,269 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 12 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 14:06:10,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=33200.0, ans=0.2 2024-08-09 14:06:22,584 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2024-08-09 14:06:33,206 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-09 14:06:34,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=33400.0, ans=0.2 2024-08-09 14:06:47,070 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3350, loss[loss=0.1386, beats_loss=0.01316, ecapa_loss=0.0006334, whisper_loss=0.1191, over 21561.00 frames. ], tot_loss[loss=0.131, beats_loss=0.0136, ecapa_loss=0.0005932, whisper_loss=0.1115, over 3888427.95 frames. ], batch size: 89, lr: 4.30e-02, grad_scale: 64.0 2024-08-09 14:06:55,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=33500.0, ans=10.0 2024-08-09 14:07:18,439 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 33 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 14:07:26,667 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-09 14:07:26,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=33800.0, ans=0.2 2024-08-09 14:07:31,096 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2024-08-09 14:07:43,045 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.81 vs. limit=22.5 2024-08-09 14:07:43,688 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-09 14:07:43,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=33900.0, ans=0.5 2024-08-09 14:07:53,330 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 3.123e+01 3.529e+01 4.678e+01 1.147e+02, threshold=7.058e+01, percent-clipped=6.0 2024-08-09 14:07:53,357 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3400, loss[loss=0.1173, beats_loss=0.01401, ecapa_loss=0.0005464, whisper_loss=0.09787, over 18506.00 frames. ], tot_loss[loss=0.1309, beats_loss=0.01353, ecapa_loss=0.0005896, whisper_loss=0.1115, over 3898954.55 frames. ], batch size: 73, lr: 4.29e-02, grad_scale: 64.0 2024-08-09 14:08:06,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=34100.0, ans=0.2 2024-08-09 14:08:12,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=34100.0, ans=0.125 2024-08-09 14:08:13,281 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 14:08:22,185 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 30 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 14:08:44,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=34400.0, ans=0.125 2024-08-09 14:08:56,060 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2024-08-09 14:08:57,864 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3450, loss[loss=0.1166, beats_loss=0.01718, ecapa_loss=0.0004887, whisper_loss=0.09451, over 22701.00 frames. ], tot_loss[loss=0.1305, beats_loss=0.01347, ecapa_loss=0.0005871, whisper_loss=0.1112, over 3903180.58 frames. ], batch size: 93, lr: 4.29e-02, grad_scale: 64.0 2024-08-09 14:09:05,532 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 14:09:08,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=34500.0, ans=0.1 2024-08-09 14:09:12,097 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 14:09:16,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=34600.0, ans=0.125 2024-08-09 14:09:23,279 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.82 vs. limit=22.5 2024-08-09 14:09:47,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.85 vs. limit=12.0 2024-08-09 14:09:48,118 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-09 14:09:56,520 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 14:10:02,683 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.921e+01 3.468e+01 4.313e+01 8.519e+01, threshold=6.936e+01, percent-clipped=1.0 2024-08-09 14:10:02,703 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3500, loss[loss=0.1085, beats_loss=0.01616, ecapa_loss=0.0006036, whisper_loss=0.08631, over 22521.00 frames. ], tot_loss[loss=0.1301, beats_loss=0.01354, ecapa_loss=0.0005838, whisper_loss=0.1107, over 3887634.20 frames. ], batch size: 93, lr: 4.28e-02, grad_scale: 64.0 2024-08-09 14:10:05,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=35000.0, ans=0.125 2024-08-09 14:10:08,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=35000.0, ans=0.0032608695652173916 2024-08-09 14:10:21,224 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-09 14:10:24,258 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2024-08-09 14:10:36,028 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-09 14:10:42,382 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 14:10:47,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=35300.0, ans=0.2 2024-08-09 14:10:55,310 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-09 14:10:59,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=35400.0, ans=0.125 2024-08-09 14:11:03,176 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 14:11:07,954 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3550, loss[loss=0.1136, beats_loss=0.0139, ecapa_loss=0.0005098, whisper_loss=0.09457, over 19242.00 frames. ], tot_loss[loss=0.1301, beats_loss=0.01349, ecapa_loss=0.0005811, whisper_loss=0.1108, over 3880628.08 frames. ], batch size: 75, lr: 4.28e-02, grad_scale: 64.0 2024-08-09 14:11:09,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=35500.0, ans=0.125 2024-08-09 14:11:14,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=35500.0, ans=0.04949747468305833 2024-08-09 14:11:22,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=35600.0, ans=0.125 2024-08-09 14:11:27,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=35600.0, ans=0.125 2024-08-09 14:11:33,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=35700.0, ans=0.003108695652173913 2024-08-09 14:11:39,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=35700.0, ans=0.125 2024-08-09 14:11:41,709 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 14:11:44,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=35700.0, ans=15.0 2024-08-09 14:11:51,562 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.57 vs. limit=22.5 2024-08-09 14:11:54,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=35800.0, ans=0.0030869565217391303 2024-08-09 14:11:59,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=15.0 2024-08-09 14:12:05,510 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2024-08-09 14:12:10,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=35900.0, ans=0.0030652173913043477 2024-08-09 14:12:13,377 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 3.146e+01 3.821e+01 4.721e+01 1.022e+02, threshold=7.642e+01, percent-clipped=5.0 2024-08-09 14:12:13,405 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3600, loss[loss=0.1373, beats_loss=0.0137, ecapa_loss=0.0006749, whisper_loss=0.1169, over 14121.00 frames. ], tot_loss[loss=0.1293, beats_loss=0.01357, ecapa_loss=0.0005778, whisper_loss=0.1099, over 3901085.30 frames. ], batch size: 61, lr: 4.27e-02, grad_scale: 64.0 2024-08-09 14:12:16,418 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-09 14:12:43,114 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 14:12:48,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=36200.0, ans=0.2 2024-08-09 14:12:57,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=36300.0, ans=0.125 2024-08-09 14:13:06,149 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=22.5 2024-08-09 14:13:10,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=36400.0, ans=0.0 2024-08-09 14:13:13,737 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.49 vs. limit=22.5 2024-08-09 14:13:19,775 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3650, loss[loss=0.1041, beats_loss=0.01615, ecapa_loss=0.0004328, whisper_loss=0.08357, over 18353.00 frames. ], tot_loss[loss=0.1292, beats_loss=0.01361, ecapa_loss=0.0005694, whisper_loss=0.1099, over 3874246.29 frames. ], batch size: 71, lr: 4.27e-02, grad_scale: 64.0 2024-08-09 14:13:24,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=36500.0, ans=0.125 2024-08-09 14:13:42,030 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 14:14:08,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=36800.0, ans=0.0 2024-08-09 14:14:23,117 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.25 vs. limit=22.5 2024-08-09 14:14:24,857 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.925e+01 3.373e+01 4.021e+01 6.000e+01, threshold=6.747e+01, percent-clipped=0.0 2024-08-09 14:14:24,877 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3700, loss[loss=0.06015, beats_loss=0.01861, ecapa_loss=0.0004567, whisper_loss=0.03697, over 13219.00 frames. ], tot_loss[loss=0.1285, beats_loss=0.01361, ecapa_loss=0.000568, whisper_loss=0.1092, over 3867465.56 frames. ], batch size: 55, lr: 4.26e-02, grad_scale: 64.0 2024-08-09 14:14:51,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=37200.0, ans=0.0 2024-08-09 14:14:57,432 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.71 vs. limit=22.5 2024-08-09 14:15:04,876 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 14:15:19,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=37400.0, ans=0.125 2024-08-09 14:15:25,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=37400.0, ans=0.125 2024-08-09 14:15:30,206 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3750, loss[loss=0.1331, beats_loss=0.01458, ecapa_loss=0.0006471, whisper_loss=0.1121, over 20318.00 frames. ], tot_loss[loss=0.129, beats_loss=0.01355, ecapa_loss=0.0005692, whisper_loss=0.1098, over 3857386.13 frames. ], batch size: 89, lr: 4.26e-02, grad_scale: 64.0 2024-08-09 14:15:37,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=37500.0, ans=0.125 2024-08-09 14:15:47,090 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-09 14:15:48,243 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 30 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 14:15:56,532 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-09 14:15:56,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=37700.0, ans=0.125 2024-08-09 14:16:01,724 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 14:16:03,009 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-09 14:16:09,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=37800.0, ans=0.0 2024-08-09 14:16:19,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=37800.0, ans=0.2 2024-08-09 14:16:20,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=37800.0, ans=0.1 2024-08-09 14:16:32,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=37900.0, ans=0.0 2024-08-09 14:16:36,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 3.197e+01 3.801e+01 4.581e+01 9.571e+01, threshold=7.603e+01, percent-clipped=5.0 2024-08-09 14:16:36,552 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3800, loss[loss=0.1415, beats_loss=0.01581, ecapa_loss=0.0004768, whisper_loss=0.1209, over 18726.00 frames. ], tot_loss[loss=0.1294, beats_loss=0.01361, ecapa_loss=0.0005683, whisper_loss=0.1101, over 3871579.41 frames. ], batch size: 74, lr: 4.25e-02, grad_scale: 64.0 2024-08-09 14:16:46,076 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.84 vs. limit=22.5 2024-08-09 14:17:14,452 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 19 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 14:17:35,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=38400.0, ans=0.2 2024-08-09 14:17:41,956 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3850, loss[loss=0.1118, beats_loss=0.01178, ecapa_loss=0.0007269, whisper_loss=0.09279, over 13732.00 frames. ], tot_loss[loss=0.1292, beats_loss=0.0136, ecapa_loss=0.0005656, whisper_loss=0.1099, over 3848901.36 frames. ], batch size: 57, lr: 4.24e-02, grad_scale: 64.0 2024-08-09 14:17:55,582 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-09 14:18:18,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=38700.0, ans=0.0024565217391304354 2024-08-09 14:18:20,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=38700.0, ans=0.125 2024-08-09 14:18:22,977 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.78 vs. limit=15.0 2024-08-09 14:18:27,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2024-08-09 14:18:28,328 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 14:18:29,141 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.12 vs. limit=15.0 2024-08-09 14:18:39,860 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=16.91 vs. limit=15.0 2024-08-09 14:18:44,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=38900.0, ans=0.125 2024-08-09 14:18:44,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=38900.0, ans=0.0 2024-08-09 14:18:49,358 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+01 3.021e+01 3.699e+01 4.570e+01 7.428e+01, threshold=7.398e+01, percent-clipped=0.0 2024-08-09 14:18:49,379 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3900, loss[loss=0.12, beats_loss=0.01437, ecapa_loss=0.0004929, whisper_loss=0.1007, over 18958.00 frames. ], tot_loss[loss=0.1298, beats_loss=0.01359, ecapa_loss=0.0005627, whisper_loss=0.1106, over 3889265.45 frames. ], batch size: 76, lr: 4.24e-02, grad_scale: 64.0 2024-08-09 14:18:50,717 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-09 14:18:52,008 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 14:18:53,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=39000.0, ans=0.125 2024-08-09 14:18:55,994 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 14:19:09,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=39100.0, ans=0.125 2024-08-09 14:19:18,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=39200.0, ans=0.0023478260869565218 2024-08-09 14:19:28,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=39300.0, ans=0.125 2024-08-09 14:19:33,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=39300.0, ans=0.1 2024-08-09 14:19:53,876 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 3950, loss[loss=0.1205, beats_loss=0.0192, ecapa_loss=0.0004519, whisper_loss=0.09682, over 22259.00 frames. ], tot_loss[loss=0.1301, beats_loss=0.01356, ecapa_loss=0.0005588, whisper_loss=0.1109, over 3904619.28 frames. ], batch size: 91, lr: 4.23e-02, grad_scale: 64.0 2024-08-09 14:20:14,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=39600.0, ans=0.125 2024-08-09 14:20:21,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=39600.0, ans=0.5 2024-08-09 14:20:21,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.39 vs. limit=15.0 2024-08-09 14:20:27,559 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.62 vs. limit=22.5 2024-08-09 14:20:32,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=39700.0, ans=0.0 2024-08-09 14:20:36,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=39700.0, ans=0.125 2024-08-09 14:20:58,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=39900.0, ans=0.125 2024-08-09 14:21:00,157 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.130e-02 2024-08-09 14:21:07,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=39900.0, ans=0.125 2024-08-09 14:21:12,616 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 3.104e+01 3.769e+01 4.628e+01 7.300e+01, threshold=7.538e+01, percent-clipped=0.0 2024-08-09 14:21:12,637 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4000, loss[loss=0.1025, beats_loss=0.01553, ecapa_loss=0.0004166, whisper_loss=0.08281, over 15007.00 frames. ], tot_loss[loss=0.1304, beats_loss=0.0135, ecapa_loss=0.0005551, whisper_loss=0.1114, over 3914364.81 frames. ], batch size: 58, lr: 4.23e-02, grad_scale: 128.0 2024-08-09 14:21:31,302 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=12.0 2024-08-09 14:21:47,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=40200.0, ans=0.05 2024-08-09 14:21:49,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=40200.0, ans=0.0021304347826086954 2024-08-09 14:22:09,422 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.28 vs. limit=22.5 2024-08-09 14:22:17,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=15.0 2024-08-09 14:22:20,321 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-09 14:22:23,211 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4050, loss[loss=0.1102, beats_loss=0.01207, ecapa_loss=0.0005778, whisper_loss=0.09233, over 16458.00 frames. ], tot_loss[loss=0.1306, beats_loss=0.01345, ecapa_loss=0.0005538, whisper_loss=0.1116, over 3912086.25 frames. ], batch size: 69, lr: 4.22e-02, grad_scale: 128.0 2024-08-09 14:22:27,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=40500.0, ans=0.0020652173913043477 2024-08-09 14:22:30,277 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.086e+00 2024-08-09 14:22:34,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=40500.0, ans=0.0 2024-08-09 14:22:34,595 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=19.38 vs. limit=15.0 2024-08-09 14:22:45,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40600.0, ans=0.1 2024-08-09 14:23:11,761 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 14:23:17,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=40900.0, ans=0.0 2024-08-09 14:23:20,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=40900.0, ans=0.1 2024-08-09 14:23:28,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 2.975e+01 3.511e+01 4.257e+01 6.601e+01, threshold=7.021e+01, percent-clipped=0.0 2024-08-09 14:23:28,281 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4100, loss[loss=0.1225, beats_loss=0.01493, ecapa_loss=0.0005826, whisper_loss=0.1018, over 21743.00 frames. ], tot_loss[loss=0.1299, beats_loss=0.01346, ecapa_loss=0.0005476, whisper_loss=0.111, over 3904505.99 frames. ], batch size: 93, lr: 4.22e-02, grad_scale: 128.0 2024-08-09 14:23:32,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=41000.0, ans=0.125 2024-08-09 14:23:47,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=41100.0, ans=0.0 2024-08-09 14:24:02,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=41200.0, ans=0.0 2024-08-09 14:24:09,168 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 14:24:13,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=41300.0, ans=15.0 2024-08-09 14:24:20,469 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 14:24:20,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=41400.0, ans=0.05 2024-08-09 14:24:33,593 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4150, loss[loss=0.1313, beats_loss=0.0152, ecapa_loss=0.0005741, whisper_loss=0.1103, over 21060.00 frames. ], tot_loss[loss=0.1291, beats_loss=0.01349, ecapa_loss=0.0005449, whisper_loss=0.1102, over 3884287.76 frames. ], batch size: 90, lr: 4.21e-02, grad_scale: 128.0 2024-08-09 14:24:37,493 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-09 14:24:39,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2024-08-09 14:24:54,312 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 14:25:11,022 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 24 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-09 14:25:11,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41800.0, ans=0.1 2024-08-09 14:25:12,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=41800.0, ans=0.125 2024-08-09 14:25:13,580 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 14:25:17,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=41800.0, ans=0.0017826086956521745 2024-08-09 14:25:22,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=41800.0, ans=0.0 2024-08-09 14:25:26,595 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.30 vs. limit=22.5 2024-08-09 14:25:32,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=41900.0, ans=0.125 2024-08-09 14:25:37,463 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.939e+01 3.388e+01 4.308e+01 6.716e+01, threshold=6.777e+01, percent-clipped=0.0 2024-08-09 14:25:37,482 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4200, loss[loss=0.1096, beats_loss=0.01605, ecapa_loss=0.0005383, whisper_loss=0.08815, over 18438.00 frames. ], tot_loss[loss=0.1286, beats_loss=0.01343, ecapa_loss=0.0005428, whisper_loss=0.1098, over 3881978.28 frames. ], batch size: 76, lr: 4.20e-02, grad_scale: 128.0 2024-08-09 14:25:39,455 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=12.0 2024-08-09 14:25:55,016 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2024-08-09 14:25:58,548 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 14:25:58,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42100.0, ans=0.1 2024-08-09 14:26:09,758 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.88 vs. limit=22.5 2024-08-09 14:26:14,102 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 14:26:29,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2024-08-09 14:26:41,811 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4250, loss[loss=0.1125, beats_loss=0.01606, ecapa_loss=0.0005154, whisper_loss=0.09133, over 16655.00 frames. ], tot_loss[loss=0.1277, beats_loss=0.01341, ecapa_loss=0.0005403, whisper_loss=0.1089, over 3882850.74 frames. ], batch size: 70, lr: 4.20e-02, grad_scale: 128.0 2024-08-09 14:26:43,428 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-09 14:26:52,567 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 14:26:55,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=42600.0, ans=0.001608695652173914 2024-08-09 14:27:00,884 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2024-08-09 14:27:05,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=42600.0, ans=0.125 2024-08-09 14:27:18,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.32 vs. limit=6.0 2024-08-09 14:27:25,382 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.87 vs. limit=22.5 2024-08-09 14:27:38,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=42900.0, ans=0.125 2024-08-09 14:27:41,653 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-09 14:27:46,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+01 3.006e+01 3.697e+01 4.408e+01 8.760e+01, threshold=7.393e+01, percent-clipped=1.0 2024-08-09 14:27:46,662 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4300, loss[loss=0.1343, beats_loss=0.01062, ecapa_loss=0.0005139, whisper_loss=0.1186, over 14280.00 frames. ], tot_loss[loss=0.1267, beats_loss=0.01341, ecapa_loss=0.0005373, whisper_loss=0.1079, over 3868176.15 frames. ], batch size: 57, lr: 4.19e-02, grad_scale: 128.0 2024-08-09 14:28:09,360 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-09 14:28:10,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=43100.0, ans=0.125 2024-08-09 14:28:28,706 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 14:28:37,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=43400.0, ans=0.001434782608695652 2024-08-09 14:28:51,625 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4350, loss[loss=0.1231, beats_loss=0.01422, ecapa_loss=0.0005376, whisper_loss=0.1035, over 18066.00 frames. ], tot_loss[loss=0.1262, beats_loss=0.01349, ecapa_loss=0.0005325, whisper_loss=0.1074, over 3869228.61 frames. ], batch size: 73, lr: 4.19e-02, grad_scale: 128.0 2024-08-09 14:28:52,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=43500.0, ans=0.0 2024-08-09 14:29:08,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2024-08-09 14:29:14,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=43600.0, ans=0.1 2024-08-09 14:29:26,905 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 14:29:30,172 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2024-08-09 14:29:47,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=43900.0, ans=0.125 2024-08-09 14:29:49,783 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 14:29:50,898 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 14:29:57,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.950e+01 3.412e+01 4.173e+01 7.476e+01, threshold=6.823e+01, percent-clipped=1.0 2024-08-09 14:29:57,812 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4400, loss[loss=0.1576, beats_loss=0.01151, ecapa_loss=0.0005165, whisper_loss=0.1409, over 23270.00 frames. ], tot_loss[loss=0.1265, beats_loss=0.01352, ecapa_loss=0.0005299, whisper_loss=0.1077, over 3853055.58 frames. ], batch size: 86, lr: 4.18e-02, grad_scale: 128.0 2024-08-09 14:29:59,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=44000.0, ans=0.125 2024-08-09 14:30:06,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=44000.0, ans=0.07 2024-08-09 14:30:07,021 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.94 vs. limit=22.5 2024-08-09 14:30:08,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=44000.0, ans=0.125 2024-08-09 14:30:44,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=44200.0, ans=0.125 2024-08-09 14:31:22,076 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4450, loss[loss=0.1175, beats_loss=0.01213, ecapa_loss=0.0004775, whisper_loss=0.1006, over 17496.00 frames. ], tot_loss[loss=0.1265, beats_loss=0.01355, ecapa_loss=0.0005251, whisper_loss=0.1077, over 3836340.51 frames. ], batch size: 68, lr: 4.17e-02, grad_scale: 128.0 2024-08-09 14:31:25,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=44500.0, ans=0.1 2024-08-09 14:31:50,242 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.57 vs. limit=6.0 2024-08-09 14:32:07,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=44700.0, ans=0.0011521739130434788 2024-08-09 14:32:31,577 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-09 14:32:48,656 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.568e+01 3.039e+01 3.733e+01 4.656e+01 8.279e+01, threshold=7.465e+01, percent-clipped=2.0 2024-08-09 14:32:48,676 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4500, loss[loss=0.1107, beats_loss=0.01625, ecapa_loss=0.0003833, whisper_loss=0.09064, over 17809.00 frames. ], tot_loss[loss=0.1272, beats_loss=0.01353, ecapa_loss=0.0005235, whisper_loss=0.1085, over 3850557.07 frames. ], batch size: 69, lr: 4.17e-02, grad_scale: 128.0 2024-08-09 14:32:51,872 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 14:32:53,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45000.0, ans=0.1 2024-08-09 14:32:59,782 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-09 14:33:15,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=45100.0, ans=0.015 2024-08-09 14:33:16,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=45100.0, ans=0.125 2024-08-09 14:33:23,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=45200.0, ans=0.0 2024-08-09 14:33:28,536 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=27.04 vs. limit=22.5 2024-08-09 14:33:33,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45200.0, ans=0.1 2024-08-09 14:33:34,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=45200.0, ans=0.0 2024-08-09 14:33:38,236 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.36 vs. limit=15.0 2024-08-09 14:33:48,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=45300.0, ans=0.0 2024-08-09 14:33:58,785 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 14:34:01,904 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 14:34:11,143 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4550, loss[loss=0.1171, beats_loss=0.01502, ecapa_loss=0.0004267, whisper_loss=0.09778, over 22167.00 frames. ], tot_loss[loss=0.1271, beats_loss=0.01357, ecapa_loss=0.0005229, whisper_loss=0.1083, over 3904490.55 frames. ], batch size: 89, lr: 4.16e-02, grad_scale: 128.0 2024-08-09 14:34:14,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=45500.0, ans=0.07 2024-08-09 14:34:22,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=45500.0, ans=0.0009782608695652183 2024-08-09 14:34:26,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=45600.0, ans=0.1 2024-08-09 14:34:33,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=45600.0, ans=10.0 2024-08-09 14:34:54,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=45700.0, ans=0.2 2024-08-09 14:34:56,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=45700.0, ans=0.2 2024-08-09 14:35:06,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=45800.0, ans=0.1 2024-08-09 14:35:13,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=45800.0, ans=0.2 2024-08-09 14:35:13,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=45800.0, ans=0.2 2024-08-09 14:35:31,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=46000.0, ans=0.0008695652173913038 2024-08-09 14:35:32,578 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.954e+01 3.369e+01 4.036e+01 7.171e+01, threshold=6.737e+01, percent-clipped=0.0 2024-08-09 14:35:32,604 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4600, loss[loss=0.07792, beats_loss=0.01597, ecapa_loss=0.000509, whisper_loss=0.05686, over 16791.00 frames. ], tot_loss[loss=0.127, beats_loss=0.01351, ecapa_loss=0.0005229, whisper_loss=0.1082, over 3901667.78 frames. ], batch size: 69, lr: 4.15e-02, grad_scale: 128.0 2024-08-09 14:35:36,026 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 14:35:36,636 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-09 14:35:45,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2024-08-09 14:35:57,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=46100.0, ans=0.0008478260869565213 2024-08-09 14:35:59,805 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 14:36:01,361 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 20 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-09 14:36:02,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=46200.0, ans=0.5 2024-08-09 14:36:18,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=46300.0, ans=0.04949747468305833 2024-08-09 14:36:39,621 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 14:36:44,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=46400.0, ans=0.025 2024-08-09 14:36:50,697 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-09 14:36:54,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4650, loss[loss=0.1321, beats_loss=0.01383, ecapa_loss=0.0005205, whisper_loss=0.1131, over 21914.00 frames. ], tot_loss[loss=0.1263, beats_loss=0.01356, ecapa_loss=0.0005245, whisper_loss=0.1075, over 3910959.36 frames. ], batch size: 90, lr: 4.15e-02, grad_scale: 128.0 2024-08-09 14:36:57,746 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.95 vs. limit=6.0 2024-08-09 14:37:00,118 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 14:37:10,488 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 14:37:36,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=46700.0, ans=0.000717391304347826 2024-08-09 14:37:38,963 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2024-08-09 14:37:43,362 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 14:37:49,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=46800.0, ans=0.0006956521739130434 2024-08-09 14:37:59,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=46800.0, ans=0.1 2024-08-09 14:38:01,661 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 14:38:04,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=46900.0, ans=0.125 2024-08-09 14:38:08,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=46900.0, ans=0.0 2024-08-09 14:38:12,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=46900.0, ans=0.0 2024-08-09 14:38:17,878 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 3.046e+01 3.609e+01 4.617e+01 7.306e+01, threshold=7.217e+01, percent-clipped=2.0 2024-08-09 14:38:17,902 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4700, loss[loss=0.1381, beats_loss=0.01366, ecapa_loss=0.0005145, whisper_loss=0.1193, over 15266.00 frames. ], tot_loss[loss=0.1269, beats_loss=0.01353, ecapa_loss=0.0005203, whisper_loss=0.1081, over 3889447.30 frames. ], batch size: 57, lr: 4.14e-02, grad_scale: 128.0 2024-08-09 14:38:29,398 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.834e-01 2024-08-09 14:38:32,132 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.14 vs. limit=12.0 2024-08-09 14:38:32,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=47000.0, ans=0.0 2024-08-09 14:39:06,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=47200.0, ans=0.125 2024-08-09 14:39:13,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=47300.0, ans=0.125 2024-08-09 14:39:17,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=47300.0, ans=0.0005869565217391307 2024-08-09 14:39:22,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.68 vs. limit=22.5 2024-08-09 14:39:24,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=47300.0, ans=0.125 2024-08-09 14:39:43,283 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4750, loss[loss=0.1091, beats_loss=0.01446, ecapa_loss=0.0003744, whisper_loss=0.09085, over 20724.00 frames. ], tot_loss[loss=0.1264, beats_loss=0.0136, ecapa_loss=0.0005173, whisper_loss=0.1076, over 3898188.36 frames. ], batch size: 80, lr: 4.14e-02, grad_scale: 128.0 2024-08-09 14:39:48,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=47500.0, ans=0.0005434782608695655 2024-08-09 14:39:56,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=47500.0, ans=0.125 2024-08-09 14:40:23,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2024-08-09 14:40:54,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=47900.0, ans=0.0 2024-08-09 14:41:05,452 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.532e+01 3.158e+01 3.572e+01 4.344e+01 1.074e+02, threshold=7.144e+01, percent-clipped=1.0 2024-08-09 14:41:05,473 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4800, loss[loss=0.1335, beats_loss=0.01388, ecapa_loss=0.0004523, whisper_loss=0.1151, over 23109.00 frames. ], tot_loss[loss=0.1264, beats_loss=0.01367, ecapa_loss=0.0005169, whisper_loss=0.1076, over 3914614.83 frames. ], batch size: 91, lr: 4.13e-02, grad_scale: 128.0 2024-08-09 14:41:17,574 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-09 14:41:19,208 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-09 14:41:19,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=48000.0, ans=0.0 2024-08-09 14:41:26,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=48100.0, ans=0.2 2024-08-09 14:41:50,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=48200.0, ans=0.125 2024-08-09 14:42:03,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=48300.0, ans=0.00036956521739130513 2024-08-09 14:42:03,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=48300.0, ans=0.125 2024-08-09 14:42:11,666 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 14:42:20,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=48400.0, ans=0.125 2024-08-09 14:42:20,916 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=11.98 vs. limit=10.0 2024-08-09 14:42:31,832 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4850, loss[loss=0.1501, beats_loss=0.01211, ecapa_loss=0.0004642, whisper_loss=0.1333, over 24219.00 frames. ], tot_loss[loss=0.1273, beats_loss=0.01368, ecapa_loss=0.0005177, whisper_loss=0.1085, over 3941032.97 frames. ], batch size: 94, lr: 4.12e-02, grad_scale: 128.0 2024-08-09 14:42:43,195 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 17 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-09 14:42:54,400 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.04 vs. limit=10.0 2024-08-09 14:43:00,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.00 vs. limit=15.0 2024-08-09 14:43:03,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=48700.0, ans=0.1 2024-08-09 14:43:19,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=48700.0, ans=0.125 2024-08-09 14:43:29,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=48800.0, ans=0.125 2024-08-09 14:43:32,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=48800.0, ans=0.2 2024-08-09 14:43:33,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-09 14:43:46,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=48900.0, ans=0.125 2024-08-09 14:43:55,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=48900.0, ans=0.1 2024-08-09 14:44:00,247 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 3.275e+01 3.682e+01 4.305e+01 7.376e+01, threshold=7.365e+01, percent-clipped=1.0 2024-08-09 14:44:00,267 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4900, loss[loss=0.1427, beats_loss=0.01091, ecapa_loss=0.0004423, whisper_loss=0.1274, over 21417.00 frames. ], tot_loss[loss=0.127, beats_loss=0.01358, ecapa_loss=0.0005144, whisper_loss=0.1083, over 3915022.79 frames. ], batch size: 79, lr: 4.12e-02, grad_scale: 128.0 2024-08-09 14:44:04,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=49000.0, ans=0.125 2024-08-09 14:44:11,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=49000.0, ans=0.0 2024-08-09 14:44:19,878 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 13 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 14:44:27,625 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 14:44:41,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=49200.0, ans=0.2 2024-08-09 14:44:48,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=49200.0, ans=0.00017391304347826042 2024-08-09 14:44:52,641 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=22.5 2024-08-09 14:45:00,851 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-09 14:45:09,569 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 14:45:15,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=49400.0, ans=0.125 2024-08-09 14:45:26,184 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 4950, loss[loss=0.1474, beats_loss=0.01426, ecapa_loss=0.0004131, whisper_loss=0.129, over 23460.00 frames. ], tot_loss[loss=0.1265, beats_loss=0.0136, ecapa_loss=0.0005095, whisper_loss=0.1078, over 3893793.86 frames. ], batch size: 90, lr: 4.11e-02, grad_scale: 128.0 2024-08-09 14:45:40,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49500.0, ans=0.1 2024-08-09 14:45:40,750 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2024-08-09 14:45:59,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=49700.0, ans=0.125 2024-08-09 14:46:00,816 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 13 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 14:46:01,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=12.0 2024-08-09 14:46:18,016 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-09 14:46:52,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.233e+01 3.042e+01 3.499e+01 4.372e+01 7.194e+01, threshold=6.999e+01, percent-clipped=0.0 2024-08-09 14:46:52,327 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5000, loss[loss=0.1355, beats_loss=0.009889, ecapa_loss=0.0006017, whisper_loss=0.1196, over 14067.00 frames. ], tot_loss[loss=0.1265, beats_loss=0.01352, ecapa_loss=0.0005105, whisper_loss=0.1078, over 3874053.01 frames. ], batch size: 56, lr: 4.10e-02, grad_scale: 128.0 2024-08-09 14:46:55,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=50000.0, ans=0.0 2024-08-09 14:46:59,714 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2024-08-09 14:47:10,424 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 14:47:13,126 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-09 14:47:46,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.84 vs. limit=15.0 2024-08-09 14:48:08,707 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5050, loss[loss=0.1223, beats_loss=0.01142, ecapa_loss=0.0004975, whisper_loss=0.1059, over 14217.00 frames. ], tot_loss[loss=0.1266, beats_loss=0.01348, ecapa_loss=0.0005107, whisper_loss=0.108, over 3875645.21 frames. ], batch size: 55, lr: 4.10e-02, grad_scale: 128.0 2024-08-09 14:48:15,011 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-09 14:48:15,114 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.25 vs. limit=15.0 2024-08-09 14:48:23,677 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.29 vs. limit=10.0 2024-08-09 14:48:28,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50600.0, ans=0.1 2024-08-09 14:48:46,041 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 14:48:51,547 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 33 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 14:49:02,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50900.0, ans=0.1 2024-08-09 14:49:10,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=50900.0, ans=0.0 2024-08-09 14:49:10,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=50900.0, ans=0.125 2024-08-09 14:49:15,332 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 3.052e+01 3.532e+01 4.388e+01 7.103e+01, threshold=7.064e+01, percent-clipped=2.0 2024-08-09 14:49:15,352 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5100, loss[loss=0.1252, beats_loss=0.01458, ecapa_loss=0.0003511, whisper_loss=0.1071, over 23520.00 frames. ], tot_loss[loss=0.1277, beats_loss=0.01333, ecapa_loss=0.0005057, whisper_loss=0.1093, over 3877031.50 frames. ], batch size: 90, lr: 4.09e-02, grad_scale: 128.0 2024-08-09 14:49:23,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=51000.0, ans=0.2 2024-08-09 14:49:31,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=51100.0, ans=0.1 2024-08-09 14:49:31,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=51100.0, ans=0.1 2024-08-09 14:49:41,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=51200.0, ans=0.0 2024-08-09 14:49:43,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51200.0, ans=0.1 2024-08-09 14:49:50,649 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 14:49:53,276 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 14:49:57,756 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-09 14:50:04,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=51300.0, ans=0.125 2024-08-09 14:50:06,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=51400.0, ans=0.1 2024-08-09 14:50:20,139 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5150, loss[loss=0.1299, beats_loss=0.01231, ecapa_loss=0.0004932, whisper_loss=0.1127, over 18322.00 frames. ], tot_loss[loss=0.1271, beats_loss=0.01338, ecapa_loss=0.0005013, whisper_loss=0.1087, over 3856678.97 frames. ], batch size: 75, lr: 4.09e-02, grad_scale: 128.0 2024-08-09 14:50:28,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=51500.0, ans=0.125 2024-08-09 14:51:16,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=51900.0, ans=0.125 2024-08-09 14:51:23,667 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 14:51:25,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 2.954e+01 3.465e+01 4.225e+01 6.973e+01, threshold=6.929e+01, percent-clipped=0.0 2024-08-09 14:51:25,046 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5200, loss[loss=0.1061, beats_loss=0.01539, ecapa_loss=0.0004575, whisper_loss=0.08615, over 18035.00 frames. ], tot_loss[loss=0.1267, beats_loss=0.01337, ecapa_loss=0.0004971, whisper_loss=0.1084, over 3821939.63 frames. ], batch size: 74, lr: 4.08e-02, grad_scale: 128.0 2024-08-09 14:51:38,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=52100.0, ans=0.125 2024-08-09 14:52:16,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=52400.0, ans=0.07 2024-08-09 14:52:28,877 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5250, loss[loss=0.1093, beats_loss=0.01246, ecapa_loss=0.0007179, whisper_loss=0.08969, over 17687.00 frames. ], tot_loss[loss=0.1262, beats_loss=0.01338, ecapa_loss=0.0004959, whisper_loss=0.1078, over 3810516.12 frames. ], batch size: 79, lr: 4.07e-02, grad_scale: 128.0 2024-08-09 14:52:33,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=52500.0, ans=0.1 2024-08-09 14:52:50,998 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-09 14:53:07,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52800.0, ans=0.1 2024-08-09 14:53:16,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=52800.0, ans=0.125 2024-08-09 14:53:26,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=52900.0, ans=0.0 2024-08-09 14:53:32,148 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-09 14:53:33,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.986e+01 3.430e+01 3.984e+01 5.910e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-09 14:53:33,240 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5300, loss[loss=0.1162, beats_loss=0.01132, ecapa_loss=0.0005081, whisper_loss=0.09977, over 16757.00 frames. ], tot_loss[loss=0.1266, beats_loss=0.01335, ecapa_loss=0.0004934, whisper_loss=0.1083, over 3808659.10 frames. ], batch size: 60, lr: 4.07e-02, grad_scale: 128.0 2024-08-09 14:53:41,614 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 14:53:57,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=53200.0, ans=0.2 2024-08-09 14:54:01,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=53200.0, ans=0.07 2024-08-09 14:54:04,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=53200.0, ans=0.1 2024-08-09 14:54:38,098 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5350, loss[loss=0.1061, beats_loss=0.01457, ecapa_loss=0.0004187, whisper_loss=0.08735, over 14183.00 frames. ], tot_loss[loss=0.1266, beats_loss=0.01335, ecapa_loss=0.0004892, whisper_loss=0.1083, over 3835623.85 frames. ], batch size: 55, lr: 4.06e-02, grad_scale: 128.0 2024-08-09 14:54:53,237 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.658e-02 2024-08-09 14:54:53,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=53600.0, ans=0.1 2024-08-09 14:54:59,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=53600.0, ans=0.0 2024-08-09 14:55:00,744 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-09 14:55:03,944 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2024-08-09 14:55:06,994 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-09 14:55:11,074 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 14:55:13,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=53700.0, ans=0.0 2024-08-09 14:55:16,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=53800.0, ans=0.07 2024-08-09 14:55:24,023 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 28 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-09 14:55:28,926 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 14:55:43,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=54000.0, ans=0.2 2024-08-09 14:55:43,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 3.073e+01 3.494e+01 4.285e+01 8.308e+01, threshold=6.988e+01, percent-clipped=2.0 2024-08-09 14:55:43,783 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5400, loss[loss=0.1327, beats_loss=0.01428, ecapa_loss=0.0004651, whisper_loss=0.1138, over 20581.00 frames. ], tot_loss[loss=0.1261, beats_loss=0.01341, ecapa_loss=0.0004876, whisper_loss=0.1078, over 3836808.06 frames. ], batch size: 82, lr: 4.05e-02, grad_scale: 128.0 2024-08-09 14:55:50,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=54000.0, ans=0.1 2024-08-09 14:55:55,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=54100.0, ans=0.125 2024-08-09 14:56:09,653 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-09 14:56:17,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=54200.0, ans=0.125 2024-08-09 14:56:20,738 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2024-08-09 14:56:25,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=54300.0, ans=0.0 2024-08-09 14:56:30,751 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.55 vs. limit=22.5 2024-08-09 14:56:33,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=54300.0, ans=0.125 2024-08-09 14:56:41,319 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=15.0 2024-08-09 14:56:47,955 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5450, loss[loss=0.1432, beats_loss=0.01356, ecapa_loss=0.0003862, whisper_loss=0.1257, over 19063.00 frames. ], tot_loss[loss=0.1255, beats_loss=0.01337, ecapa_loss=0.0004884, whisper_loss=0.1072, over 3816227.80 frames. ], batch size: 69, lr: 4.05e-02, grad_scale: 128.0 2024-08-09 14:56:55,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=54500.0, ans=0.125 2024-08-09 14:57:01,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2024-08-09 14:57:51,687 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 3.087e+01 3.659e+01 4.293e+01 7.884e+01, threshold=7.318e+01, percent-clipped=2.0 2024-08-09 14:57:51,708 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5500, loss[loss=0.1368, beats_loss=0.0118, ecapa_loss=0.0006359, whisper_loss=0.1186, over 21037.00 frames. ], tot_loss[loss=0.1259, beats_loss=0.01333, ecapa_loss=0.0004851, whisper_loss=0.1078, over 3801830.08 frames. ], batch size: 89, lr: 4.04e-02, grad_scale: 128.0 2024-08-09 14:58:01,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.82 vs. limit=10.0 2024-08-09 14:58:07,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=55100.0, ans=0.0 2024-08-09 14:58:10,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=55100.0, ans=0.125 2024-08-09 14:58:17,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=55200.0, ans=0.125 2024-08-09 14:58:55,866 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5550, loss[loss=0.1284, beats_loss=0.01086, ecapa_loss=0.0005542, whisper_loss=0.112, over 18183.00 frames. ], tot_loss[loss=0.1264, beats_loss=0.01339, ecapa_loss=0.0004865, whisper_loss=0.1082, over 3853766.08 frames. ], batch size: 71, lr: 4.03e-02, grad_scale: 128.0 2024-08-09 14:59:07,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=55600.0, ans=0.125 2024-08-09 14:59:18,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=55600.0, ans=0.0 2024-08-09 14:59:25,082 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-09 14:59:26,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=55700.0, ans=0.1 2024-08-09 14:59:28,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=55700.0, ans=0.125 2024-08-09 14:59:42,860 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-09 14:59:43,597 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2024-08-09 14:59:45,348 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 14:59:46,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=55900.0, ans=0.0 2024-08-09 14:59:59,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 3.194e+01 3.634e+01 4.385e+01 7.525e+01, threshold=7.268e+01, percent-clipped=1.0 2024-08-09 14:59:59,682 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5600, loss[loss=0.1324, beats_loss=0.01438, ecapa_loss=0.0005791, whisper_loss=0.1122, over 21314.00 frames. ], tot_loss[loss=0.1256, beats_loss=0.01349, ecapa_loss=0.0004822, whisper_loss=0.1073, over 3889886.58 frames. ], batch size: 92, lr: 4.03e-02, grad_scale: 128.0 2024-08-09 15:00:06,044 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 15:00:16,491 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-09 15:00:17,624 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 15:00:20,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=56100.0, ans=0.2 2024-08-09 15:00:27,179 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-09 15:00:28,661 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.22 vs. limit=6.0 2024-08-09 15:00:38,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=56300.0, ans=0.125 2024-08-09 15:00:39,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=56300.0, ans=0.0 2024-08-09 15:00:52,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=56400.0, ans=0.0 2024-08-09 15:00:54,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=56400.0, ans=0.0 2024-08-09 15:01:03,888 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5650, loss[loss=0.1448, beats_loss=0.00868, ecapa_loss=0.0005801, whisper_loss=0.1303, over 18689.00 frames. ], tot_loss[loss=0.1254, beats_loss=0.01349, ecapa_loss=0.0004828, whisper_loss=0.1071, over 3890863.48 frames. ], batch size: 73, lr: 4.02e-02, grad_scale: 128.0 2024-08-09 15:01:14,364 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 15:01:14,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=56500.0, ans=0.09899494936611666 2024-08-09 15:01:22,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=56600.0, ans=0.1 2024-08-09 15:01:30,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=56700.0, ans=10.0 2024-08-09 15:01:32,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=56700.0, ans=0.09899494936611666 2024-08-09 15:01:44,060 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 15:01:51,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=56800.0, ans=0.2 2024-08-09 15:01:54,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=56900.0, ans=0.1 2024-08-09 15:02:01,232 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.32 vs. limit=22.5 2024-08-09 15:02:08,361 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 3.137e+01 3.741e+01 4.572e+01 6.525e+01, threshold=7.481e+01, percent-clipped=0.0 2024-08-09 15:02:08,380 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5700, loss[loss=0.1213, beats_loss=0.01447, ecapa_loss=0.000578, whisper_loss=0.101, over 20277.00 frames. ], tot_loss[loss=0.1255, beats_loss=0.01358, ecapa_loss=0.0004842, whisper_loss=0.1071, over 3903928.45 frames. ], batch size: 86, lr: 4.02e-02, grad_scale: 128.0 2024-08-09 15:02:10,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2024-08-09 15:02:20,278 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 15:02:29,617 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-09 15:02:42,278 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 15:02:49,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=57300.0, ans=0.0 2024-08-09 15:02:52,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=57300.0, ans=0.2 2024-08-09 15:03:13,287 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5750, loss[loss=0.09285, beats_loss=0.0137, ecapa_loss=0.0004246, whisper_loss=0.0749, over 16401.00 frames. ], tot_loss[loss=0.1249, beats_loss=0.01368, ecapa_loss=0.0004806, whisper_loss=0.1064, over 3917193.62 frames. ], batch size: 64, lr: 4.01e-02, grad_scale: 128.0 2024-08-09 15:03:24,248 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.20 vs. limit=6.0 2024-08-09 15:03:28,027 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-09 15:03:31,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=57600.0, ans=0.0 2024-08-09 15:03:32,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=57600.0, ans=0.0 2024-08-09 15:03:37,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57600.0, ans=0.1 2024-08-09 15:03:43,791 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 15:03:49,286 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-09 15:04:01,283 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2024-08-09 15:04:13,738 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-09 15:04:18,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.151e+01 2.931e+01 3.260e+01 3.924e+01 8.527e+01, threshold=6.521e+01, percent-clipped=1.0 2024-08-09 15:04:18,902 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5800, loss[loss=0.09743, beats_loss=0.01833, ecapa_loss=0.0003853, whisper_loss=0.07524, over 21577.00 frames. ], tot_loss[loss=0.125, beats_loss=0.01367, ecapa_loss=0.0004759, whisper_loss=0.1066, over 3915678.20 frames. ], batch size: 89, lr: 4.00e-02, grad_scale: 128.0 2024-08-09 15:04:42,347 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2024-08-09 15:04:59,199 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-09 15:04:59,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.87 vs. limit=22.5 2024-08-09 15:05:23,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=58400.0, ans=0.2 2024-08-09 15:05:25,423 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5850, loss[loss=0.1501, beats_loss=0.01107, ecapa_loss=0.0005038, whisper_loss=0.134, over 24155.00 frames. ], tot_loss[loss=0.1254, beats_loss=0.01362, ecapa_loss=0.0004788, whisper_loss=0.107, over 3925417.94 frames. ], batch size: 93, lr: 4.00e-02, grad_scale: 128.0 2024-08-09 15:05:27,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=58500.0, ans=0.0 2024-08-09 15:05:47,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=58600.0, ans=0.125 2024-08-09 15:05:50,292 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 11 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-09 15:05:53,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=58700.0, ans=0.125 2024-08-09 15:06:11,907 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 15:06:15,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=58800.0, ans=0.1 2024-08-09 15:06:21,769 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-09 15:06:23,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=58900.0, ans=0.2 2024-08-09 15:06:34,139 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 3.149e+01 3.698e+01 4.735e+01 7.316e+01, threshold=7.396e+01, percent-clipped=3.0 2024-08-09 15:06:34,159 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5900, loss[loss=0.09321, beats_loss=0.01541, ecapa_loss=0.0004897, whisper_loss=0.0729, over 15532.00 frames. ], tot_loss[loss=0.1251, beats_loss=0.01356, ecapa_loss=0.0004766, whisper_loss=0.1068, over 3901245.85 frames. ], batch size: 63, lr: 3.99e-02, grad_scale: 128.0 2024-08-09 15:06:37,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=59000.0, ans=0.025 2024-08-09 15:06:43,575 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-09 15:06:44,900 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 15:07:09,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=59200.0, ans=15.0 2024-08-09 15:07:14,524 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-09 15:07:17,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=59300.0, ans=0.125 2024-08-09 15:07:39,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=59500.0, ans=0.025 2024-08-09 15:07:40,593 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 5950, loss[loss=0.1173, beats_loss=0.01315, ecapa_loss=0.000573, whisper_loss=0.09842, over 20869.00 frames. ], tot_loss[loss=0.1243, beats_loss=0.0136, ecapa_loss=0.0004742, whisper_loss=0.106, over 3888872.70 frames. ], batch size: 88, lr: 3.98e-02, grad_scale: 128.0 2024-08-09 15:08:05,557 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=12.0 2024-08-09 15:08:06,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=59600.0, ans=0.125 2024-08-09 15:08:13,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=59700.0, ans=0.125 2024-08-09 15:08:31,948 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.75 vs. limit=22.5 2024-08-09 15:08:33,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.85 vs. limit=15.0 2024-08-09 15:08:48,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=59900.0, ans=0.2 2024-08-09 15:08:52,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.193e+01 2.855e+01 3.241e+01 4.234e+01 7.891e+01, threshold=6.482e+01, percent-clipped=2.0 2024-08-09 15:08:52,089 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6000, loss[loss=0.08846, beats_loss=0.01508, ecapa_loss=0.000455, whisper_loss=0.06883, over 16241.00 frames. ], tot_loss[loss=0.1239, beats_loss=0.01348, ecapa_loss=0.0004728, whisper_loss=0.1057, over 3874057.49 frames. ], batch size: 68, lr: 3.98e-02, grad_scale: 256.0 2024-08-09 15:08:52,089 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-09 15:09:28,581 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on ASR_libri: loss=0.2951, beats_loss=0, ecapa_loss=0.001297, whisper_loss=0.2822, over 922467.00 frames. 2024-08-09 15:09:39,047 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.8396, 1.7261, 1.8776, 1.3270, 0.6826, 1.7873, 1.8100, 1.4767], device='cuda:1') 2024-08-09 15:09:46,099 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on SV_voxceleb1: loss=0.01236, beats_loss=0, ecapa_loss=0.001236, whisper_loss=0, over 939242.00 frames. 2024-08-09 15:10:57,896 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.8788, 5.8708, 5.7672, 5.9875], device='cuda:1') 2024-08-09 15:11:29,817 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on AT_audioset: loss=0.03246, beats_loss=0.03246, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 15:11:29,821 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-09 15:11:40,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=60000.0, ans=0.2 2024-08-09 15:11:56,739 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2024-08-09 15:12:02,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=60200.0, ans=0.125 2024-08-09 15:12:36,901 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-09 15:12:41,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=60400.0, ans=0.2 2024-08-09 15:12:44,991 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6050, loss[loss=0.1162, beats_loss=0.01339, ecapa_loss=0.000446, whisper_loss=0.09834, over 18850.00 frames. ], tot_loss[loss=0.1248, beats_loss=0.01334, ecapa_loss=0.0004694, whisper_loss=0.1067, over 3880728.57 frames. ], batch size: 74, lr: 3.97e-02, grad_scale: 256.0 2024-08-09 15:12:51,752 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.44 vs. limit=22.5 2024-08-09 15:12:53,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=60500.0, ans=0.125 2024-08-09 15:12:56,477 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-09 15:12:56,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=60500.0, ans=0.125 2024-08-09 15:13:02,575 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-09 15:13:02,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=60600.0, ans=0.0 2024-08-09 15:13:14,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=60700.0, ans=0.125 2024-08-09 15:13:20,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.31 vs. limit=22.5 2024-08-09 15:13:47,303 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.95 vs. limit=22.5 2024-08-09 15:13:49,177 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 15:13:59,502 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 3.011e+01 3.542e+01 4.337e+01 6.873e+01, threshold=7.084e+01, percent-clipped=1.0 2024-08-09 15:13:59,523 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6100, loss[loss=0.1177, beats_loss=0.01289, ecapa_loss=0.000496, whisper_loss=0.09982, over 21064.00 frames. ], tot_loss[loss=0.1252, beats_loss=0.0133, ecapa_loss=0.0004693, whisper_loss=0.1072, over 3908394.80 frames. ], batch size: 84, lr: 3.96e-02, grad_scale: 256.0 2024-08-09 15:14:00,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=61000.0, ans=0.09899494936611666 2024-08-09 15:14:17,197 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-09 15:14:28,015 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 15:14:36,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=61200.0, ans=0.0 2024-08-09 15:14:42,537 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 15:14:42,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=61300.0, ans=0.2 2024-08-09 15:14:56,774 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2024-08-09 15:15:11,366 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.39 vs. limit=22.5 2024-08-09 15:15:13,754 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6150, loss[loss=0.1509, beats_loss=0.01085, ecapa_loss=0.0005327, whisper_loss=0.1347, over 19112.00 frames. ], tot_loss[loss=0.125, beats_loss=0.01331, ecapa_loss=0.0004693, whisper_loss=0.107, over 3912214.46 frames. ], batch size: 75, lr: 3.96e-02, grad_scale: 256.0 2024-08-09 15:15:22,417 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 15:15:28,414 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-09 15:15:35,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2024-08-09 15:15:37,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=61600.0, ans=0.2 2024-08-09 15:15:37,403 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2024-08-09 15:15:44,049 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-09 15:15:54,326 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.15 vs. limit=15.0 2024-08-09 15:16:04,193 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 22 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-09 15:16:05,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=61800.0, ans=0.125 2024-08-09 15:16:27,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 3.116e+01 3.579e+01 4.385e+01 6.920e+01, threshold=7.157e+01, percent-clipped=0.0 2024-08-09 15:16:27,982 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6200, loss[loss=0.116, beats_loss=0.01436, ecapa_loss=0.0004872, whisper_loss=0.09676, over 15572.00 frames. ], tot_loss[loss=0.1248, beats_loss=0.01332, ecapa_loss=0.0004686, whisper_loss=0.1068, over 3881133.68 frames. ], batch size: 65, lr: 3.95e-02, grad_scale: 256.0 2024-08-09 15:16:34,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=62000.0, ans=0.2 2024-08-09 15:16:44,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=62100.0, ans=0.0 2024-08-09 15:16:47,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=62100.0, ans=0.0 2024-08-09 15:16:52,643 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.10 vs. limit=15.0 2024-08-09 15:16:53,431 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-09 15:16:53,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=62100.0, ans=0.07 2024-08-09 15:16:57,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=62200.0, ans=0.0 2024-08-09 15:17:00,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=62200.0, ans=0.125 2024-08-09 15:17:00,436 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=15.0 2024-08-09 15:17:10,744 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-09 15:17:19,447 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.06 vs. limit=15.0 2024-08-09 15:17:21,084 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=15.0 2024-08-09 15:17:25,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=62300.0, ans=0.125 2024-08-09 15:17:43,694 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6250, loss[loss=0.1267, beats_loss=0.01136, ecapa_loss=0.0004529, whisper_loss=0.1108, over 15416.00 frames. ], tot_loss[loss=0.1254, beats_loss=0.01327, ecapa_loss=0.0004666, whisper_loss=0.1075, over 3883744.49 frames. ], batch size: 58, lr: 3.94e-02, grad_scale: 256.0 2024-08-09 15:17:48,896 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.34 vs. limit=15.0 2024-08-09 15:18:01,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=62600.0, ans=0.125 2024-08-09 15:18:08,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=62600.0, ans=0.125 2024-08-09 15:18:10,110 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 18 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 15:18:12,253 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=15.0 2024-08-09 15:18:35,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=62800.0, ans=0.0 2024-08-09 15:18:40,902 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 15:18:42,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=62900.0, ans=0.0 2024-08-09 15:19:00,071 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.251e+01 2.965e+01 3.406e+01 4.255e+01 1.028e+02, threshold=6.812e+01, percent-clipped=2.0 2024-08-09 15:19:00,092 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6300, loss[loss=0.137, beats_loss=0.01165, ecapa_loss=0.0005588, whisper_loss=0.1197, over 21389.00 frames. ], tot_loss[loss=0.1254, beats_loss=0.01336, ecapa_loss=0.0004651, whisper_loss=0.1074, over 3909300.03 frames. ], batch size: 88, lr: 3.94e-02, grad_scale: 256.0 2024-08-09 15:19:02,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=15.0 2024-08-09 15:19:08,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=63000.0, ans=0.125 2024-08-09 15:19:22,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=63100.0, ans=0.125 2024-08-09 15:19:29,022 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2024-08-09 15:19:34,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=63200.0, ans=0.125 2024-08-09 15:19:39,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=63200.0, ans=0.125 2024-08-09 15:19:41,416 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.21 vs. limit=12.0 2024-08-09 15:19:42,681 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-09 15:20:05,972 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.91 vs. limit=15.0 2024-08-09 15:20:16,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2024-08-09 15:20:18,858 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6350, loss[loss=0.1047, beats_loss=0.01589, ecapa_loss=0.0004615, whisper_loss=0.08419, over 20759.00 frames. ], tot_loss[loss=0.1256, beats_loss=0.01329, ecapa_loss=0.000468, whisper_loss=0.1076, over 3910767.58 frames. ], batch size: 89, lr: 3.93e-02, grad_scale: 256.0 2024-08-09 15:20:20,325 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-09 15:20:20,535 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:20:31,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=63500.0, ans=0.0 2024-08-09 15:20:50,052 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-09 15:20:55,921 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 20 from LS+wenet, 32 from Vox, 42 fro AS 2024-08-09 15:20:59,321 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-09 15:21:20,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=63900.0, ans=0.2 2024-08-09 15:21:27,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=12.0 2024-08-09 15:21:38,640 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 3.075e+01 3.568e+01 4.201e+01 6.933e+01, threshold=7.136e+01, percent-clipped=1.0 2024-08-09 15:21:38,671 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6400, loss[loss=0.1283, beats_loss=0.01142, ecapa_loss=0.000476, whisper_loss=0.1121, over 22286.00 frames. ], tot_loss[loss=0.1255, beats_loss=0.01331, ecapa_loss=0.0004668, whisper_loss=0.1075, over 3888018.96 frames. ], batch size: 89, lr: 3.92e-02, grad_scale: 256.0 2024-08-09 15:21:56,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=64100.0, ans=0.125 2024-08-09 15:22:34,017 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-09 15:22:35,430 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-09 15:22:38,167 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2024-08-09 15:22:41,491 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=12.0 2024-08-09 15:22:45,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=64400.0, ans=0.1 2024-08-09 15:22:45,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=64400.0, ans=0.125 2024-08-09 15:22:49,882 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 31 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-09 15:22:55,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=64400.0, ans=0.125 2024-08-09 15:22:57,775 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6450, loss[loss=0.124, beats_loss=0.01267, ecapa_loss=0.0004828, whisper_loss=0.1065, over 22096.00 frames. ], tot_loss[loss=0.1255, beats_loss=0.01335, ecapa_loss=0.000465, whisper_loss=0.1075, over 3887619.93 frames. ], batch size: 87, lr: 3.92e-02, grad_scale: 256.0 2024-08-09 15:22:59,180 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-09 15:23:01,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=64500.0, ans=0.1 2024-08-09 15:23:33,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=64700.0, ans=0.1 2024-08-09 15:23:37,834 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 15:23:39,786 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.984e+00 2024-08-09 15:23:48,487 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 15:23:57,022 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.25 vs. limit=22.5 2024-08-09 15:24:01,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=64900.0, ans=0.125 2024-08-09 15:24:06,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=64900.0, ans=0.125 2024-08-09 15:24:09,824 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.92 vs. limit=10.0 2024-08-09 15:24:17,651 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 3.103e+01 3.527e+01 4.351e+01 8.335e+01, threshold=7.053e+01, percent-clipped=1.0 2024-08-09 15:24:17,672 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6500, loss[loss=0.1246, beats_loss=0.01184, ecapa_loss=0.0005223, whisper_loss=0.1075, over 16101.00 frames. ], tot_loss[loss=0.1256, beats_loss=0.01328, ecapa_loss=0.0004629, whisper_loss=0.1077, over 3858057.04 frames. ], batch size: 65, lr: 3.91e-02, grad_scale: 256.0 2024-08-09 15:24:40,097 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.251e+00 2024-08-09 15:24:44,142 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2024-08-09 15:25:11,364 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 15:25:18,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=65300.0, ans=0.125 2024-08-09 15:25:21,173 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-09 15:25:37,896 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6550, loss[loss=0.1208, beats_loss=0.01322, ecapa_loss=0.0003892, whisper_loss=0.1036, over 18503.00 frames. ], tot_loss[loss=0.1256, beats_loss=0.01333, ecapa_loss=0.0004637, whisper_loss=0.1077, over 3891954.23 frames. ], batch size: 71, lr: 3.91e-02, grad_scale: 256.0 2024-08-09 15:25:38,622 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2024-08-09 15:25:51,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=65500.0, ans=0.07 2024-08-09 15:26:14,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=65700.0, ans=0.125 2024-08-09 15:26:25,681 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-09 15:26:28,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=65800.0, ans=0.125 2024-08-09 15:26:44,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=65900.0, ans=0.125 2024-08-09 15:26:56,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66000.0, ans=0.1 2024-08-09 15:26:57,203 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 3.063e+01 3.628e+01 4.391e+01 7.750e+01, threshold=7.256e+01, percent-clipped=3.0 2024-08-09 15:26:57,226 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6600, loss[loss=0.1374, beats_loss=0.01095, ecapa_loss=0.0006426, whisper_loss=0.1201, over 17534.00 frames. ], tot_loss[loss=0.1256, beats_loss=0.01328, ecapa_loss=0.0004664, whisper_loss=0.1076, over 3880778.22 frames. ], batch size: 73, lr: 3.90e-02, grad_scale: 256.0 2024-08-09 15:27:02,044 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-08-09 15:27:15,559 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.30 vs. limit=6.0 2024-08-09 15:27:17,195 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.45 vs. limit=10.0 2024-08-09 15:27:17,773 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-09 15:27:19,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=66100.0, ans=0.125 2024-08-09 15:27:25,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=66100.0, ans=0.125 2024-08-09 15:27:40,995 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 15:27:57,155 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 15:27:57,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=66400.0, ans=0.125 2024-08-09 15:28:00,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=66400.0, ans=0.125 2024-08-09 15:28:03,461 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2024-08-09 15:28:14,643 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6650, loss[loss=0.136, beats_loss=0.01348, ecapa_loss=0.0005098, whisper_loss=0.1174, over 22334.00 frames. ], tot_loss[loss=0.1244, beats_loss=0.01344, ecapa_loss=0.0004622, whisper_loss=0.1064, over 3914928.93 frames. ], batch size: 92, lr: 3.89e-02, grad_scale: 256.0 2024-08-09 15:28:19,623 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.90 vs. limit=10.0 2024-08-09 15:28:34,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66600.0, ans=0.1 2024-08-09 15:28:35,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=66600.0, ans=0.09899494936611666 2024-08-09 15:28:37,454 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:28:37,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66600.0, ans=0.1 2024-08-09 15:29:31,298 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 3.012e+01 3.433e+01 4.224e+01 7.038e+01, threshold=6.866e+01, percent-clipped=0.0 2024-08-09 15:29:31,318 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6700, loss[loss=0.1238, beats_loss=0.01688, ecapa_loss=0.0003725, whisper_loss=0.1032, over 14189.00 frames. ], tot_loss[loss=0.1248, beats_loss=0.01344, ecapa_loss=0.0004626, whisper_loss=0.1067, over 3939865.62 frames. ], batch size: 55, lr: 3.89e-02, grad_scale: 256.0 2024-08-09 15:29:57,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.94 vs. limit=15.0 2024-08-09 15:30:00,600 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 27 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 15:30:10,798 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-09 15:30:15,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=67200.0, ans=0.2 2024-08-09 15:30:29,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.85 vs. limit=10.0 2024-08-09 15:30:40,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=67400.0, ans=0.0 2024-08-09 15:30:47,545 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6750, loss[loss=0.1302, beats_loss=0.01134, ecapa_loss=0.0005208, whisper_loss=0.1136, over 18402.00 frames. ], tot_loss[loss=0.1252, beats_loss=0.0134, ecapa_loss=0.0004589, whisper_loss=0.1072, over 3910230.44 frames. ], batch size: 74, lr: 3.88e-02, grad_scale: 256.0 2024-08-09 15:30:50,802 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-08-09 15:31:01,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67600.0, ans=0.1 2024-08-09 15:31:10,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=67600.0, ans=0.0 2024-08-09 15:31:14,571 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-08-09 15:31:17,822 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.50 vs. limit=10.0 2024-08-09 15:31:45,437 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.15 vs. limit=22.5 2024-08-09 15:31:50,182 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 15:32:03,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.094e+01 3.540e+01 4.120e+01 7.157e+01, threshold=7.079e+01, percent-clipped=1.0 2024-08-09 15:32:03,572 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6800, loss[loss=0.116, beats_loss=0.01549, ecapa_loss=0.0004453, whisper_loss=0.09604, over 22605.00 frames. ], tot_loss[loss=0.1243, beats_loss=0.01347, ecapa_loss=0.0004618, whisper_loss=0.1062, over 3900051.71 frames. ], batch size: 90, lr: 3.87e-02, grad_scale: 256.0 2024-08-09 15:32:16,178 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 15:32:33,166 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.943e+00 2024-08-09 15:32:41,223 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-09 15:33:02,305 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 15:33:06,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=68400.0, ans=0.0 2024-08-09 15:33:06,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=68400.0, ans=0.125 2024-08-09 15:33:10,413 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.36 vs. limit=22.5 2024-08-09 15:33:11,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=68400.0, ans=0.07 2024-08-09 15:33:15,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=68400.0, ans=0.05 2024-08-09 15:33:16,240 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=12.0 2024-08-09 15:33:17,937 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6850, loss[loss=0.1167, beats_loss=0.01385, ecapa_loss=0.0005217, whisper_loss=0.09764, over 21556.00 frames. ], tot_loss[loss=0.1233, beats_loss=0.01346, ecapa_loss=0.0004622, whisper_loss=0.1052, over 3888199.47 frames. ], batch size: 89, lr: 3.87e-02, grad_scale: 256.0 2024-08-09 15:33:25,526 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-09 15:33:38,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=68600.0, ans=0.125 2024-08-09 15:33:44,896 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-09 15:33:59,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=68700.0, ans=0.125 2024-08-09 15:34:04,306 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-09 15:34:06,629 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 39 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 15:34:31,791 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 15:34:33,092 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 3.018e+01 3.583e+01 4.075e+01 7.184e+01, threshold=7.167e+01, percent-clipped=2.0 2024-08-09 15:34:33,112 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6900, loss[loss=0.1315, beats_loss=0.013, ecapa_loss=0.000383, whisper_loss=0.1147, over 24004.00 frames. ], tot_loss[loss=0.123, beats_loss=0.01357, ecapa_loss=0.0004591, whisper_loss=0.1049, over 3869775.82 frames. ], batch size: 91, lr: 3.86e-02, grad_scale: 256.0 2024-08-09 15:34:45,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=69000.0, ans=0.125 2024-08-09 15:35:19,419 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.035e+00 2024-08-09 15:35:21,569 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=15.0 2024-08-09 15:35:41,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=69400.0, ans=0.07 2024-08-09 15:35:44,241 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-09 15:35:49,661 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 6950, loss[loss=0.1103, beats_loss=0.01142, ecapa_loss=0.0003889, whisper_loss=0.09497, over 14672.00 frames. ], tot_loss[loss=0.1238, beats_loss=0.01355, ecapa_loss=0.0004549, whisper_loss=0.1057, over 3884077.83 frames. ], batch size: 56, lr: 3.85e-02, grad_scale: 256.0 2024-08-09 15:35:53,724 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 28 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 15:36:04,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2024-08-09 15:36:15,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=69600.0, ans=0.125 2024-08-09 15:36:19,462 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 39 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 15:36:33,643 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 33 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 15:36:48,845 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 19 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 15:36:56,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=69900.0, ans=0.125 2024-08-09 15:37:07,922 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 3.090e+01 3.523e+01 4.430e+01 8.295e+01, threshold=7.046e+01, percent-clipped=3.0 2024-08-09 15:37:07,943 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7000, loss[loss=0.132, beats_loss=0.01262, ecapa_loss=0.0004676, whisper_loss=0.1148, over 14346.00 frames. ], tot_loss[loss=0.1244, beats_loss=0.0134, ecapa_loss=0.0004565, whisper_loss=0.1064, over 3872240.74 frames. ], batch size: 58, lr: 3.85e-02, grad_scale: 256.0 2024-08-09 15:37:11,850 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 15:37:14,516 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 15:37:19,014 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 15:37:46,675 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.04 vs. limit=6.0 2024-08-09 15:38:01,369 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2024-08-09 15:38:02,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=70300.0, ans=0.2 2024-08-09 15:38:09,585 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.986e-01 2024-08-09 15:38:13,206 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 30 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-09 15:38:14,659 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 15:38:14,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=70400.0, ans=0.125 2024-08-09 15:38:26,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=70400.0, ans=0.125 2024-08-09 15:38:29,035 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7050, loss[loss=0.1503, beats_loss=0.01224, ecapa_loss=0.0004584, whisper_loss=0.1334, over 22481.00 frames. ], tot_loss[loss=0.124, beats_loss=0.01338, ecapa_loss=0.0004549, whisper_loss=0.1061, over 3887554.56 frames. ], batch size: 89, lr: 3.84e-02, grad_scale: 256.0 2024-08-09 15:38:41,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70500.0, ans=0.1 2024-08-09 15:38:53,138 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2024-08-09 15:38:53,882 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:38:53,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=70600.0, ans=0.125 2024-08-09 15:39:06,818 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-09 15:39:10,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=70700.0, ans=0.125 2024-08-09 15:39:22,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=70800.0, ans=0.125 2024-08-09 15:39:27,223 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-09 15:39:27,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70900.0, ans=0.1 2024-08-09 15:39:34,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=70900.0, ans=0.0 2024-08-09 15:39:43,668 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.931e+01 3.439e+01 4.149e+01 6.385e+01, threshold=6.878e+01, percent-clipped=0.0 2024-08-09 15:39:43,688 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7100, loss[loss=0.141, beats_loss=0.01073, ecapa_loss=0.0004227, whisper_loss=0.1261, over 15163.00 frames. ], tot_loss[loss=0.1246, beats_loss=0.0133, ecapa_loss=0.0004519, whisper_loss=0.1068, over 3905538.19 frames. ], batch size: 55, lr: 3.83e-02, grad_scale: 256.0 2024-08-09 15:39:48,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=71000.0, ans=0.0 2024-08-09 15:39:51,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=71000.0, ans=0.1 2024-08-09 15:40:07,319 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.54 vs. limit=22.5 2024-08-09 15:40:08,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=71100.0, ans=0.2 2024-08-09 15:40:12,667 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-09 15:40:15,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=71200.0, ans=0.07 2024-08-09 15:40:31,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=71300.0, ans=0.125 2024-08-09 15:40:32,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.00 vs. limit=22.5 2024-08-09 15:40:39,819 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.37 vs. limit=22.5 2024-08-09 15:40:48,111 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 15:40:52,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=71400.0, ans=0.2 2024-08-09 15:40:55,066 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-09 15:40:56,742 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-09 15:41:00,822 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7150, loss[loss=0.114, beats_loss=0.01463, ecapa_loss=0.0004864, whisper_loss=0.09445, over 18137.00 frames. ], tot_loss[loss=0.124, beats_loss=0.01336, ecapa_loss=0.0004499, whisper_loss=0.1061, over 3911486.78 frames. ], batch size: 74, lr: 3.83e-02, grad_scale: 256.0 2024-08-09 15:41:06,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=71500.0, ans=0.125 2024-08-09 15:41:16,939 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-09 15:41:17,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.52 vs. limit=22.5 2024-08-09 15:41:35,834 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 15:41:36,214 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.126e-01 2024-08-09 15:41:37,240 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 15:41:46,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=71800.0, ans=0.125 2024-08-09 15:41:50,284 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.875e-01 2024-08-09 15:41:51,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=71800.0, ans=0.125 2024-08-09 15:41:53,158 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 15:41:54,709 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-09 15:41:59,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=71800.0, ans=0.0 2024-08-09 15:42:09,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=71900.0, ans=0.05 2024-08-09 15:42:21,651 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 3.087e+01 3.536e+01 4.239e+01 7.384e+01, threshold=7.073e+01, percent-clipped=1.0 2024-08-09 15:42:21,675 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7200, loss[loss=0.1223, beats_loss=0.01033, ecapa_loss=0.0005396, whisper_loss=0.1066, over 22531.00 frames. ], tot_loss[loss=0.1241, beats_loss=0.01333, ecapa_loss=0.0004497, whisper_loss=0.1062, over 3900932.30 frames. ], batch size: 91, lr: 3.82e-02, grad_scale: 256.0 2024-08-09 15:42:32,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=72000.0, ans=0.1 2024-08-09 15:42:38,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=72100.0, ans=0.125 2024-08-09 15:43:17,616 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.61 vs. limit=22.5 2024-08-09 15:43:24,791 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-09 15:43:27,875 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 15:43:46,369 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7250, loss[loss=0.1319, beats_loss=0.009555, ecapa_loss=0.0005334, whisper_loss=0.117, over 19127.00 frames. ], tot_loss[loss=0.1236, beats_loss=0.01331, ecapa_loss=0.0004506, whisper_loss=0.1058, over 3893660.73 frames. ], batch size: 77, lr: 3.82e-02, grad_scale: 256.0 2024-08-09 15:43:47,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=72500.0, ans=0.125 2024-08-09 15:43:50,271 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 31 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-09 15:44:03,232 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 15:44:39,015 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.095e-02 2024-08-09 15:44:45,591 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 15:44:45,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=72800.0, ans=0.125 2024-08-09 15:44:53,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=72800.0, ans=0.1 2024-08-09 15:45:09,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=72900.0, ans=0.0 2024-08-09 15:45:18,754 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-09 15:45:20,049 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.182e+01 3.061e+01 3.709e+01 4.320e+01 7.317e+01, threshold=7.418e+01, percent-clipped=1.0 2024-08-09 15:45:20,077 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7300, loss[loss=0.1258, beats_loss=0.01534, ecapa_loss=0.0004262, whisper_loss=0.1062, over 16979.00 frames. ], tot_loss[loss=0.1244, beats_loss=0.01326, ecapa_loss=0.0004537, whisper_loss=0.1066, over 3878790.98 frames. ], batch size: 69, lr: 3.81e-02, grad_scale: 256.0 2024-08-09 15:45:41,694 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 15:45:41,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=73100.0, ans=0.125 2024-08-09 15:45:44,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=73100.0, ans=0.125 2024-08-09 15:45:54,681 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.79 vs. limit=15.0 2024-08-09 15:46:05,438 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2024-08-09 15:46:33,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.77 vs. limit=6.0 2024-08-09 15:46:38,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=73400.0, ans=0.125 2024-08-09 15:46:43,741 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.565e+00 2024-08-09 15:46:46,475 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 15:46:50,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73400.0, ans=0.1 2024-08-09 15:46:52,760 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7350, loss[loss=0.1206, beats_loss=0.01558, ecapa_loss=0.0003907, whisper_loss=0.1012, over 23288.00 frames. ], tot_loss[loss=0.1236, beats_loss=0.01332, ecapa_loss=0.0004514, whisper_loss=0.1058, over 3885172.59 frames. ], batch size: 91, lr: 3.80e-02, grad_scale: 256.0 2024-08-09 15:47:09,194 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-09 15:47:14,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=73600.0, ans=0.125 2024-08-09 15:47:29,610 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-09 15:48:11,102 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2024-08-09 15:48:15,834 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 15:48:21,306 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.843e+01 3.393e+01 4.039e+01 7.371e+01, threshold=6.786e+01, percent-clipped=0.0 2024-08-09 15:48:21,331 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7400, loss[loss=0.134, beats_loss=0.01114, ecapa_loss=0.000511, whisper_loss=0.1177, over 19933.00 frames. ], tot_loss[loss=0.1239, beats_loss=0.01331, ecapa_loss=0.0004537, whisper_loss=0.1061, over 3883117.17 frames. ], batch size: 81, lr: 3.80e-02, grad_scale: 256.0 2024-08-09 15:48:22,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=74000.0, ans=0.2 2024-08-09 15:48:37,814 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 15:48:42,354 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.69 vs. limit=6.0 2024-08-09 15:48:47,651 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 15:48:48,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=74100.0, ans=0.07 2024-08-09 15:48:51,298 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 15:49:15,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=74200.0, ans=0.0 2024-08-09 15:49:28,786 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 15:49:45,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=74400.0, ans=10.0 2024-08-09 15:49:46,192 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2024-08-09 15:49:47,820 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.50 vs. limit=15.0 2024-08-09 15:49:56,546 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7450, loss[loss=0.1329, beats_loss=0.01134, ecapa_loss=0.0005491, whisper_loss=0.1161, over 21238.00 frames. ], tot_loss[loss=0.1238, beats_loss=0.01336, ecapa_loss=0.0004523, whisper_loss=0.1059, over 3881022.38 frames. ], batch size: 90, lr: 3.79e-02, grad_scale: 256.0 2024-08-09 15:50:24,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=74600.0, ans=0.0 2024-08-09 15:50:29,905 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-09 15:50:30,450 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2024-08-09 15:50:34,631 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 14 from Vox, 54 fro AS 2024-08-09 15:50:38,086 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:50:46,068 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 16 from Vox, 52 fro AS 2024-08-09 15:50:53,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=74800.0, ans=0.125 2024-08-09 15:51:10,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=74900.0, ans=0.2 2024-08-09 15:51:13,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 3.130e+01 3.399e+01 4.155e+01 7.076e+01, threshold=6.798e+01, percent-clipped=1.0 2024-08-09 15:51:13,774 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7500, loss[loss=0.1084, beats_loss=0.01594, ecapa_loss=0.0004122, whisper_loss=0.08837, over 20828.00 frames. ], tot_loss[loss=0.1238, beats_loss=0.01333, ecapa_loss=0.0004513, whisper_loss=0.106, over 3890645.27 frames. ], batch size: 87, lr: 3.78e-02, grad_scale: 256.0 2024-08-09 15:51:14,736 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=15.0 2024-08-09 15:51:20,138 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 15:51:22,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=75000.0, ans=0.125 2024-08-09 15:51:43,046 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 15:52:13,773 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-09 15:52:16,794 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-09 15:52:24,860 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7550, loss[loss=0.137, beats_loss=0.0113, ecapa_loss=0.000476, whisper_loss=0.121, over 22424.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01342, ecapa_loss=0.0004476, whisper_loss=0.1056, over 3866104.96 frames. ], batch size: 89, lr: 3.78e-02, grad_scale: 256.0 2024-08-09 15:52:29,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=75500.0, ans=0.125 2024-08-09 15:52:37,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=75600.0, ans=0.1 2024-08-09 15:52:42,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=75600.0, ans=0.1 2024-08-09 15:52:51,597 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.30 vs. limit=22.5 2024-08-09 15:53:09,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=75800.0, ans=0.0 2024-08-09 15:53:10,782 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 15:53:34,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=76000.0, ans=0.2 2024-08-09 15:53:35,425 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.089e+01 3.036e+01 3.542e+01 4.226e+01 5.898e+01, threshold=7.084e+01, percent-clipped=0.0 2024-08-09 15:53:35,447 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7600, loss[loss=0.1404, beats_loss=0.01073, ecapa_loss=0.0004402, whisper_loss=0.1252, over 20690.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01338, ecapa_loss=0.0004484, whisper_loss=0.1057, over 3856110.12 frames. ], batch size: 78, lr: 3.77e-02, grad_scale: 256.0 2024-08-09 15:53:36,214 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.04 vs. limit=6.0 2024-08-09 15:54:03,658 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 28 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-09 15:54:10,536 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 38 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 15:54:13,351 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 15:54:26,658 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 30 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 15:54:37,718 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-09 15:54:45,536 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=22.5 2024-08-09 15:54:46,110 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7650, loss[loss=0.1309, beats_loss=0.01414, ecapa_loss=0.0004033, whisper_loss=0.1127, over 22928.00 frames. ], tot_loss[loss=0.1241, beats_loss=0.01335, ecapa_loss=0.0004486, whisper_loss=0.1063, over 3863326.92 frames. ], batch size: 93, lr: 3.77e-02, grad_scale: 256.0 2024-08-09 15:54:54,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.56 vs. limit=15.0 2024-08-09 15:55:14,048 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 17 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-09 15:55:15,650 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-09 15:55:17,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=76700.0, ans=0.125 2024-08-09 15:55:20,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=76700.0, ans=0.0 2024-08-09 15:55:29,526 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-09 15:55:34,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=76800.0, ans=10.0 2024-08-09 15:55:40,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=76900.0, ans=0.0 2024-08-09 15:55:54,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=77000.0, ans=0.2 2024-08-09 15:55:55,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.321e+01 3.065e+01 3.556e+01 4.140e+01 7.466e+01, threshold=7.113e+01, percent-clipped=1.0 2024-08-09 15:55:55,144 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7700, loss[loss=0.1045, beats_loss=0.01562, ecapa_loss=0.0004362, whisper_loss=0.08456, over 21001.00 frames. ], tot_loss[loss=0.123, beats_loss=0.01333, ecapa_loss=0.0004474, whisper_loss=0.1052, over 3865056.61 frames. ], batch size: 87, lr: 3.76e-02, grad_scale: 256.0 2024-08-09 15:56:14,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=77100.0, ans=0.0 2024-08-09 15:56:27,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=77200.0, ans=0.125 2024-08-09 15:56:31,550 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:56:34,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=77200.0, ans=0.0 2024-08-09 15:56:34,511 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.079e+01 2024-08-09 15:56:40,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=77300.0, ans=0.125 2024-08-09 15:56:47,375 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 15:57:01,697 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 15:57:04,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=77400.0, ans=0.125 2024-08-09 15:57:07,549 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7750, loss[loss=0.1101, beats_loss=0.01504, ecapa_loss=0.0004058, whisper_loss=0.09097, over 19773.00 frames. ], tot_loss[loss=0.1236, beats_loss=0.01331, ecapa_loss=0.0004438, whisper_loss=0.1058, over 3874679.85 frames. ], batch size: 79, lr: 3.75e-02, grad_scale: 256.0 2024-08-09 15:57:19,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=77500.0, ans=0.125 2024-08-09 15:57:22,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=77600.0, ans=0.0 2024-08-09 15:57:30,301 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-09 15:57:34,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=77700.0, ans=0.125 2024-08-09 15:57:40,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=77700.0, ans=0.125 2024-08-09 15:57:47,402 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-09 15:57:52,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=77800.0, ans=0.0 2024-08-09 15:57:54,030 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 15:58:02,350 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 15:58:14,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=77900.0, ans=0.0 2024-08-09 15:58:17,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+01 2.915e+01 3.303e+01 4.126e+01 7.711e+01, threshold=6.607e+01, percent-clipped=1.0 2024-08-09 15:58:17,300 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7800, loss[loss=0.08612, beats_loss=0.01367, ecapa_loss=0.0004761, whisper_loss=0.06768, over 13546.00 frames. ], tot_loss[loss=0.1232, beats_loss=0.01336, ecapa_loss=0.0004439, whisper_loss=0.1054, over 3865197.08 frames. ], batch size: 56, lr: 3.75e-02, grad_scale: 256.0 2024-08-09 15:58:25,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=78000.0, ans=0.2 2024-08-09 15:58:43,660 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.26 vs. limit=15.0 2024-08-09 15:58:47,255 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 15:58:50,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=78200.0, ans=0.0 2024-08-09 15:59:08,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=78300.0, ans=0.0 2024-08-09 15:59:12,920 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-09 15:59:19,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=78400.0, ans=0.2 2024-08-09 15:59:26,498 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7850, loss[loss=0.137, beats_loss=0.01486, ecapa_loss=0.0003973, whisper_loss=0.1182, over 22841.00 frames. ], tot_loss[loss=0.1238, beats_loss=0.01332, ecapa_loss=0.0004415, whisper_loss=0.1061, over 3850675.11 frames. ], batch size: 92, lr: 3.74e-02, grad_scale: 256.0 2024-08-09 15:59:39,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=78600.0, ans=0.0 2024-08-09 15:59:50,280 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-09 15:59:52,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=78600.0, ans=0.125 2024-08-09 15:59:56,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=78700.0, ans=0.1 2024-08-09 16:00:29,243 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.33 vs. limit=15.0 2024-08-09 16:00:35,140 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 3.036e+01 3.521e+01 4.450e+01 7.582e+01, threshold=7.043e+01, percent-clipped=4.0 2024-08-09 16:00:35,160 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7900, loss[loss=0.138, beats_loss=0.01333, ecapa_loss=0.0003988, whisper_loss=0.1207, over 18130.00 frames. ], tot_loss[loss=0.1229, beats_loss=0.01342, ecapa_loss=0.0004411, whisper_loss=0.1051, over 3834204.80 frames. ], batch size: 70, lr: 3.73e-02, grad_scale: 256.0 2024-08-09 16:00:57,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=79100.0, ans=0.035 2024-08-09 16:00:57,336 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.175e+00 2024-08-09 16:01:10,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=79200.0, ans=0.1 2024-08-09 16:01:11,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=79200.0, ans=0.0 2024-08-09 16:01:27,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=79300.0, ans=0.125 2024-08-09 16:01:43,801 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 7950, loss[loss=0.1383, beats_loss=0.01357, ecapa_loss=0.0004624, whisper_loss=0.1201, over 20291.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01344, ecapa_loss=0.0004368, whisper_loss=0.1049, over 3831256.99 frames. ], batch size: 84, lr: 3.73e-02, grad_scale: 256.0 2024-08-09 16:01:53,431 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 16:02:05,446 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-09 16:02:25,086 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-09 16:02:25,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=79800.0, ans=0.04949747468305833 2024-08-09 16:02:31,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=79800.0, ans=0.125 2024-08-09 16:02:38,612 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.48 vs. limit=10.0 2024-08-09 16:02:42,938 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=12.0 2024-08-09 16:02:48,906 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.41 vs. limit=15.0 2024-08-09 16:02:54,515 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+01 3.066e+01 3.561e+01 4.217e+01 9.530e+01, threshold=7.122e+01, percent-clipped=2.0 2024-08-09 16:02:54,536 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8000, loss[loss=0.133, beats_loss=0.008772, ecapa_loss=0.0004995, whisper_loss=0.1193, over 14038.00 frames. ], tot_loss[loss=0.123, beats_loss=0.01336, ecapa_loss=0.000433, whisper_loss=0.1053, over 3851062.72 frames. ], batch size: 56, lr: 3.72e-02, grad_scale: 512.0 2024-08-09 16:03:14,596 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.35 vs. limit=6.0 2024-08-09 16:03:17,848 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 16:03:21,503 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 16:03:23,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2024-08-09 16:03:29,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=80200.0, ans=0.1 2024-08-09 16:03:31,364 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-09 16:03:52,794 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-09 16:04:01,669 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8050, loss[loss=0.1313, beats_loss=0.01409, ecapa_loss=0.0004561, whisper_loss=0.1126, over 21953.00 frames. ], tot_loss[loss=0.1224, beats_loss=0.01334, ecapa_loss=0.0004368, whisper_loss=0.1047, over 3830849.76 frames. ], batch size: 88, lr: 3.72e-02, grad_scale: 512.0 2024-08-09 16:04:05,766 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-09 16:04:20,181 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-09 16:04:32,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=80700.0, ans=0.2 2024-08-09 16:04:36,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=80700.0, ans=0.1 2024-08-09 16:04:40,271 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 16:04:41,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=80800.0, ans=0.07 2024-08-09 16:04:44,754 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-09 16:05:10,264 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 3.011e+01 3.515e+01 4.189e+01 8.391e+01, threshold=7.029e+01, percent-clipped=0.0 2024-08-09 16:05:10,285 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8100, loss[loss=0.1472, beats_loss=0.01063, ecapa_loss=0.0004315, whisper_loss=0.1322, over 23388.00 frames. ], tot_loss[loss=0.1231, beats_loss=0.0133, ecapa_loss=0.0004378, whisper_loss=0.1055, over 3847980.44 frames. ], batch size: 91, lr: 3.71e-02, grad_scale: 512.0 2024-08-09 16:05:10,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=81000.0, ans=0.0 2024-08-09 16:05:33,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=81100.0, ans=0.125 2024-08-09 16:05:44,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=81200.0, ans=0.125 2024-08-09 16:05:47,701 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.54 vs. limit=15.0 2024-08-09 16:05:48,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=81200.0, ans=0.125 2024-08-09 16:05:54,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81300.0, ans=0.1 2024-08-09 16:06:08,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=81400.0, ans=0.125 2024-08-09 16:06:09,273 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-09 16:06:09,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=81400.0, ans=0.125 2024-08-09 16:06:19,028 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8150, loss[loss=0.1066, beats_loss=0.01487, ecapa_loss=0.0004755, whisper_loss=0.08701, over 17304.00 frames. ], tot_loss[loss=0.1228, beats_loss=0.01329, ecapa_loss=0.0004379, whisper_loss=0.1051, over 3845326.45 frames. ], batch size: 76, lr: 3.70e-02, grad_scale: 512.0 2024-08-09 16:06:29,621 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.09 vs. limit=22.5 2024-08-09 16:06:40,225 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-09 16:06:52,895 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-09 16:06:54,592 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 33 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-09 16:06:55,797 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 16:07:00,120 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 16:07:09,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=81800.0, ans=0.0 2024-08-09 16:07:27,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 3.111e+01 3.553e+01 4.149e+01 8.297e+01, threshold=7.106e+01, percent-clipped=2.0 2024-08-09 16:07:27,547 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8200, loss[loss=0.1344, beats_loss=0.0139, ecapa_loss=0.0003705, whisper_loss=0.1168, over 23524.00 frames. ], tot_loss[loss=0.1229, beats_loss=0.01325, ecapa_loss=0.0004356, whisper_loss=0.1053, over 3861305.67 frames. ], batch size: 91, lr: 3.70e-02, grad_scale: 512.0 2024-08-09 16:07:38,661 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 33 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 16:07:41,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=82100.0, ans=0.0 2024-08-09 16:07:42,633 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-09 16:07:46,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=82100.0, ans=0.125 2024-08-09 16:07:48,697 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=12.0 2024-08-09 16:07:53,496 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 27 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-09 16:07:53,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=82200.0, ans=0.125 2024-08-09 16:08:00,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=82200.0, ans=0.125 2024-08-09 16:08:14,752 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2024-08-09 16:08:34,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=82400.0, ans=0.125 2024-08-09 16:08:34,993 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 16:08:36,069 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8250, loss[loss=0.1202, beats_loss=0.0121, ecapa_loss=0.0004857, whisper_loss=0.1033, over 15045.00 frames. ], tot_loss[loss=0.1238, beats_loss=0.01318, ecapa_loss=0.0004364, whisper_loss=0.1062, over 3880038.73 frames. ], batch size: 60, lr: 3.69e-02, grad_scale: 512.0 2024-08-09 16:08:38,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=82500.0, ans=0.0 2024-08-09 16:08:39,198 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 16:08:51,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=82600.0, ans=0.0 2024-08-09 16:08:52,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=82600.0, ans=0.125 2024-08-09 16:08:55,301 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 16:09:08,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82700.0, ans=0.1 2024-08-09 16:09:09,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=82700.0, ans=0.125 2024-08-09 16:09:12,162 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 16:09:13,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=82700.0, ans=0.2 2024-08-09 16:09:23,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=82800.0, ans=0.125 2024-08-09 16:09:33,197 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.86 vs. limit=15.0 2024-08-09 16:09:34,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=82900.0, ans=0.125 2024-08-09 16:09:37,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=82900.0, ans=0.07 2024-08-09 16:09:37,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82900.0, ans=0.1 2024-08-09 16:09:44,590 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.998e+01 3.523e+01 3.969e+01 6.917e+01, threshold=7.045e+01, percent-clipped=0.0 2024-08-09 16:09:44,624 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8300, loss[loss=0.1399, beats_loss=0.0109, ecapa_loss=0.0004434, whisper_loss=0.1246, over 17680.00 frames. ], tot_loss[loss=0.1229, beats_loss=0.01325, ecapa_loss=0.0004333, whisper_loss=0.1054, over 3885387.39 frames. ], batch size: 68, lr: 3.68e-02, grad_scale: 512.0 2024-08-09 16:09:47,635 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 16:10:05,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=83100.0, ans=0.035 2024-08-09 16:10:21,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=83200.0, ans=0.1 2024-08-09 16:10:29,616 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 16:10:39,004 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-09 16:10:57,669 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8350, loss[loss=0.1255, beats_loss=0.01273, ecapa_loss=0.0003936, whisper_loss=0.1089, over 18253.00 frames. ], tot_loss[loss=0.1229, beats_loss=0.0132, ecapa_loss=0.0004344, whisper_loss=0.1053, over 3878373.89 frames. ], batch size: 71, lr: 3.68e-02, grad_scale: 512.0 2024-08-09 16:10:59,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=83500.0, ans=0.0 2024-08-09 16:11:02,117 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 16:11:06,524 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.554e-01 2024-08-09 16:11:07,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=83500.0, ans=0.1 2024-08-09 16:11:33,988 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 25 from Vox, 15 fro AS 2024-08-09 16:11:49,667 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 16:11:51,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=83900.0, ans=0.1 2024-08-09 16:11:57,563 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-09 16:12:08,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 3.077e+01 3.401e+01 4.133e+01 6.317e+01, threshold=6.802e+01, percent-clipped=0.0 2024-08-09 16:12:08,720 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8400, loss[loss=0.1226, beats_loss=0.01335, ecapa_loss=0.000485, whisper_loss=0.1044, over 17137.00 frames. ], tot_loss[loss=0.1233, beats_loss=0.01319, ecapa_loss=0.0004336, whisper_loss=0.1058, over 3906938.60 frames. ], batch size: 70, lr: 3.67e-02, grad_scale: 512.0 2024-08-09 16:12:13,706 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 16:12:32,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=84100.0, ans=0.125 2024-08-09 16:12:37,861 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-08-09 16:12:42,089 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 16:12:48,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=84200.0, ans=0.95 2024-08-09 16:12:56,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84300.0, ans=0.1 2024-08-09 16:13:30,077 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8450, loss[loss=0.1399, beats_loss=0.01266, ecapa_loss=0.0004095, whisper_loss=0.1232, over 22120.00 frames. ], tot_loss[loss=0.1234, beats_loss=0.01315, ecapa_loss=0.0004316, whisper_loss=0.1059, over 3925797.61 frames. ], batch size: 87, lr: 3.67e-02, grad_scale: 512.0 2024-08-09 16:13:30,319 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 30 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-09 16:13:49,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=84600.0, ans=0.125 2024-08-09 16:13:50,942 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 16:13:54,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=84600.0, ans=0.09899494936611666 2024-08-09 16:13:55,040 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 17 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-09 16:14:01,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=84700.0, ans=0.05 2024-08-09 16:14:16,941 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 16:14:18,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=84800.0, ans=0.1 2024-08-09 16:14:29,240 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 16:14:39,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84900.0, ans=0.1 2024-08-09 16:14:40,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=84900.0, ans=0.2 2024-08-09 16:14:44,847 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.323e-02 2024-08-09 16:14:51,040 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+01 2.954e+01 3.407e+01 4.304e+01 7.894e+01, threshold=6.814e+01, percent-clipped=2.0 2024-08-09 16:14:51,064 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8500, loss[loss=0.127, beats_loss=0.01255, ecapa_loss=0.0003598, whisper_loss=0.1108, over 17587.00 frames. ], tot_loss[loss=0.1232, beats_loss=0.01316, ecapa_loss=0.0004305, whisper_loss=0.1058, over 3963829.27 frames. ], batch size: 64, lr: 3.66e-02, grad_scale: 512.0 2024-08-09 16:15:27,431 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.98 vs. limit=6.0 2024-08-09 16:15:36,836 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 16:16:00,339 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 16:16:08,482 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 40 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 16:16:15,593 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-09 16:16:26,997 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8550, loss[loss=0.09193, beats_loss=0.01436, ecapa_loss=0.0004794, whisper_loss=0.07277, over 18316.00 frames. ], tot_loss[loss=0.1228, beats_loss=0.01332, ecapa_loss=0.0004262, whisper_loss=0.1053, over 3953047.17 frames. ], batch size: 79, lr: 3.65e-02, grad_scale: 512.0 2024-08-09 16:16:28,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.48 vs. limit=10.0 2024-08-09 16:16:36,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=85500.0, ans=0.0 2024-08-09 16:16:36,489 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=15.0 2024-08-09 16:16:50,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=85600.0, ans=0.125 2024-08-09 16:16:56,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=85600.0, ans=0.125 2024-08-09 16:16:58,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=85600.0, ans=0.125 2024-08-09 16:17:14,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=85700.0, ans=0.0 2024-08-09 16:17:16,307 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2024-08-09 16:17:56,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=85900.0, ans=0.125 2024-08-09 16:18:03,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.231e+01 2.923e+01 3.374e+01 4.145e+01 6.398e+01, threshold=6.748e+01, percent-clipped=0.0 2024-08-09 16:18:03,702 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8600, loss[loss=0.1287, beats_loss=0.01493, ecapa_loss=0.0003908, whisper_loss=0.1098, over 21577.00 frames. ], tot_loss[loss=0.1231, beats_loss=0.01324, ecapa_loss=0.0004268, whisper_loss=0.1056, over 3939861.47 frames. ], batch size: 87, lr: 3.65e-02, grad_scale: 512.0 2024-08-09 16:18:07,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=86000.0, ans=0.09899494936611666 2024-08-09 16:18:26,813 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2024-08-09 16:18:35,049 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-09 16:18:46,444 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 29 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 16:19:11,946 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.55 vs. limit=15.0 2024-08-09 16:19:21,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=86400.0, ans=0.125 2024-08-09 16:19:34,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=86400.0, ans=0.125 2024-08-09 16:19:36,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=86400.0, ans=0.1 2024-08-09 16:19:40,831 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8650, loss[loss=0.1148, beats_loss=0.01234, ecapa_loss=0.0004982, whisper_loss=0.09747, over 18529.00 frames. ], tot_loss[loss=0.1232, beats_loss=0.01327, ecapa_loss=0.0004273, whisper_loss=0.1056, over 3924372.88 frames. ], batch size: 76, lr: 3.64e-02, grad_scale: 512.0 2024-08-09 16:19:45,778 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-09 16:19:46,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=86500.0, ans=0.125 2024-08-09 16:20:25,628 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 16:20:26,571 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-09 16:20:29,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=86800.0, ans=0.125 2024-08-09 16:20:31,634 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-09 16:20:37,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=86900.0, ans=0.125 2024-08-09 16:20:59,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.913e+01 3.504e+01 4.209e+01 7.626e+01, threshold=7.009e+01, percent-clipped=5.0 2024-08-09 16:20:59,470 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8700, loss[loss=0.1528, beats_loss=0.01181, ecapa_loss=0.0005211, whisper_loss=0.1358, over 22683.00 frames. ], tot_loss[loss=0.123, beats_loss=0.01323, ecapa_loss=0.0004272, whisper_loss=0.1055, over 3915182.21 frames. ], batch size: 93, lr: 3.64e-02, grad_scale: 512.0 2024-08-09 16:20:59,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=87000.0, ans=0.125 2024-08-09 16:22:18,284 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.34 vs. limit=15.0 2024-08-09 16:22:20,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=87400.0, ans=0.125 2024-08-09 16:22:28,821 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8750, loss[loss=0.1231, beats_loss=0.01038, ecapa_loss=0.0005517, whisper_loss=0.1072, over 16244.00 frames. ], tot_loss[loss=0.123, beats_loss=0.01316, ecapa_loss=0.0004267, whisper_loss=0.1056, over 3887044.26 frames. ], batch size: 68, lr: 3.63e-02, grad_scale: 512.0 2024-08-09 16:22:44,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=87600.0, ans=0.125 2024-08-09 16:22:50,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=87600.0, ans=0.125 2024-08-09 16:22:50,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=87600.0, ans=0.125 2024-08-09 16:22:55,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87600.0, ans=0.1 2024-08-09 16:23:00,736 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-08-09 16:23:05,255 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=15.0 2024-08-09 16:23:11,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87800.0, ans=0.1 2024-08-09 16:23:23,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=87800.0, ans=0.0 2024-08-09 16:23:28,789 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.86 vs. limit=15.0 2024-08-09 16:23:32,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=87900.0, ans=0.125 2024-08-09 16:23:39,380 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.881e+01 3.394e+01 4.029e+01 7.137e+01, threshold=6.788e+01, percent-clipped=1.0 2024-08-09 16:23:39,404 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8800, loss[loss=0.1149, beats_loss=0.01522, ecapa_loss=0.0003902, whisper_loss=0.09574, over 22418.00 frames. ], tot_loss[loss=0.1233, beats_loss=0.01321, ecapa_loss=0.0004248, whisper_loss=0.1059, over 3890507.27 frames. ], batch size: 87, lr: 3.62e-02, grad_scale: 512.0 2024-08-09 16:23:42,999 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 16:23:50,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=88000.0, ans=0.2 2024-08-09 16:23:55,239 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 16:24:15,446 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 16:24:18,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=88200.0, ans=0.125 2024-08-09 16:24:27,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=88300.0, ans=0.125 2024-08-09 16:24:32,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=88300.0, ans=0.125 2024-08-09 16:24:41,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=88400.0, ans=0.1 2024-08-09 16:24:51,508 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8850, loss[loss=0.12, beats_loss=0.0137, ecapa_loss=0.0003675, whisper_loss=0.1026, over 20103.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01323, ecapa_loss=0.0004244, whisper_loss=0.1052, over 3915399.28 frames. ], batch size: 78, lr: 3.62e-02, grad_scale: 512.0 2024-08-09 16:25:04,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88600.0, ans=0.1 2024-08-09 16:25:05,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88600.0, ans=0.1 2024-08-09 16:25:24,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88700.0, ans=0.1 2024-08-09 16:25:29,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=88700.0, ans=0.1 2024-08-09 16:25:29,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=88700.0, ans=0.125 2024-08-09 16:25:47,728 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 16:25:49,167 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 17 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 16:25:59,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88900.0, ans=0.1 2024-08-09 16:26:01,931 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.897e+01 3.367e+01 4.055e+01 6.951e+01, threshold=6.734e+01, percent-clipped=1.0 2024-08-09 16:26:01,952 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8900, loss[loss=0.1304, beats_loss=0.01309, ecapa_loss=0.0004175, whisper_loss=0.1131, over 22363.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01332, ecapa_loss=0.0004223, whisper_loss=0.1039, over 3896870.13 frames. ], batch size: 90, lr: 3.61e-02, grad_scale: 512.0 2024-08-09 16:26:12,408 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.17 vs. limit=10.0 2024-08-09 16:26:19,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=89100.0, ans=0.0 2024-08-09 16:26:20,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=89100.0, ans=0.0 2024-08-09 16:26:33,460 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-09 16:26:38,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=89200.0, ans=0.1 2024-08-09 16:26:40,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.94 vs. limit=6.0 2024-08-09 16:26:42,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=89300.0, ans=0.125 2024-08-09 16:26:43,696 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-09 16:27:04,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=89400.0, ans=10.0 2024-08-09 16:27:10,675 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 8950, loss[loss=0.1332, beats_loss=0.01092, ecapa_loss=0.0004071, whisper_loss=0.1182, over 16549.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01342, ecapa_loss=0.0004224, whisper_loss=0.103, over 3871657.70 frames. ], batch size: 65, lr: 3.61e-02, grad_scale: 512.0 2024-08-09 16:27:11,580 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2024-08-09 16:27:13,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=89500.0, ans=0.0 2024-08-09 16:27:22,079 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-09 16:27:30,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=89600.0, ans=0.125 2024-08-09 16:27:31,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89600.0, ans=0.1 2024-08-09 16:27:40,029 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=12.0 2024-08-09 16:27:49,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=89700.0, ans=0.07 2024-08-09 16:27:54,787 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.31 vs. limit=10.0 2024-08-09 16:27:57,399 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 16:27:57,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=89800.0, ans=0.0 2024-08-09 16:28:12,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=89900.0, ans=0.0 2024-08-09 16:28:20,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.962e+01 3.391e+01 3.948e+01 7.468e+01, threshold=6.781e+01, percent-clipped=1.0 2024-08-09 16:28:20,178 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9000, loss[loss=0.162, beats_loss=0.008393, ecapa_loss=0.0004647, whisper_loss=0.149, over 23713.00 frames. ], tot_loss[loss=0.1205, beats_loss=0.01341, ecapa_loss=0.0004233, whisper_loss=0.1028, over 3872881.43 frames. ], batch size: 89, lr: 3.60e-02, grad_scale: 512.0 2024-08-09 16:28:20,179 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-09 16:29:00,152 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on ASR_libri: loss=0.2932, beats_loss=0, ecapa_loss=0.001188, whisper_loss=0.2813, over 922467.00 frames. 2024-08-09 16:29:16,691 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on SV_voxceleb1: loss=0.01105, beats_loss=0, ecapa_loss=0.001105, whisper_loss=0, over 939242.00 frames. 2024-08-09 16:30:21,738 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.9879, 1.6292, 2.0561, 1.5155], device='cuda:1') 2024-08-09 16:31:15,717 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on AT_audioset: loss=0.03209, beats_loss=0.03209, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 16:31:15,720 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-09 16:31:17,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=90000.0, ans=0.125 2024-08-09 16:31:25,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=90000.0, ans=0.0 2024-08-09 16:31:33,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=90100.0, ans=0.1 2024-08-09 16:31:54,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=90200.0, ans=0.0 2024-08-09 16:32:12,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=90400.0, ans=0.04949747468305833 2024-08-09 16:32:24,203 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9050, loss[loss=0.1202, beats_loss=0.01351, ecapa_loss=0.0003739, whisper_loss=0.103, over 21329.00 frames. ], tot_loss[loss=0.1213, beats_loss=0.01333, ecapa_loss=0.0004243, whisper_loss=0.1037, over 3903605.57 frames. ], batch size: 85, lr: 3.59e-02, grad_scale: 512.0 2024-08-09 16:32:55,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=90700.0, ans=0.1 2024-08-09 16:33:13,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=90800.0, ans=0.125 2024-08-09 16:33:13,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=90800.0, ans=0.125 2024-08-09 16:33:27,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=90900.0, ans=0.2 2024-08-09 16:33:29,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.58 vs. limit=10.0 2024-08-09 16:33:31,721 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-09 16:33:32,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.994e+01 3.542e+01 4.086e+01 6.210e+01, threshold=7.084e+01, percent-clipped=0.0 2024-08-09 16:33:32,767 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9100, loss[loss=0.1295, beats_loss=0.0145, ecapa_loss=0.0004085, whisper_loss=0.1109, over 23117.00 frames. ], tot_loss[loss=0.122, beats_loss=0.01322, ecapa_loss=0.0004275, whisper_loss=0.1045, over 3887084.00 frames. ], batch size: 93, lr: 3.59e-02, grad_scale: 512.0 2024-08-09 16:33:38,624 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-09 16:33:40,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=91000.0, ans=0.025 2024-08-09 16:33:42,716 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 20 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-09 16:33:47,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=91100.0, ans=0.2 2024-08-09 16:33:49,890 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.34 vs. limit=6.0 2024-08-09 16:33:56,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=91100.0, ans=0.125 2024-08-09 16:34:07,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=91200.0, ans=0.2 2024-08-09 16:34:18,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=91300.0, ans=0.125 2024-08-09 16:34:22,218 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-09 16:34:35,530 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 16:34:41,050 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9150, loss[loss=0.1427, beats_loss=0.01304, ecapa_loss=0.0004086, whisper_loss=0.1255, over 17903.00 frames. ], tot_loss[loss=0.1221, beats_loss=0.01323, ecapa_loss=0.0004249, whisper_loss=0.1047, over 3891865.89 frames. ], batch size: 70, lr: 3.58e-02, grad_scale: 512.0 2024-08-09 16:34:54,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=91600.0, ans=0.0 2024-08-09 16:35:00,627 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 34 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 16:35:06,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91600.0, ans=0.1 2024-08-09 16:35:06,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=91600.0, ans=0.2 2024-08-09 16:35:26,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=91800.0, ans=0.125 2024-08-09 16:35:39,944 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 16:35:42,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=91900.0, ans=0.0 2024-08-09 16:35:49,426 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.827e+01 3.202e+01 3.925e+01 7.636e+01, threshold=6.404e+01, percent-clipped=0.0 2024-08-09 16:35:49,453 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9200, loss[loss=0.1296, beats_loss=0.01584, ecapa_loss=0.0003933, whisper_loss=0.1098, over 22746.00 frames. ], tot_loss[loss=0.1223, beats_loss=0.01341, ecapa_loss=0.0004228, whisper_loss=0.1046, over 3921121.08 frames. ], batch size: 92, lr: 3.58e-02, grad_scale: 512.0 2024-08-09 16:35:50,460 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.85 vs. limit=6.0 2024-08-09 16:35:58,497 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 9 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 16:36:01,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=92000.0, ans=0.0 2024-08-09 16:36:11,401 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-09 16:36:24,822 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-09 16:36:27,706 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.629e+00 2024-08-09 16:36:29,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=92200.0, ans=0.05 2024-08-09 16:36:45,208 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-09 16:36:55,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=92400.0, ans=0.125 2024-08-09 16:36:58,910 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9250, loss[loss=0.1425, beats_loss=0.01144, ecapa_loss=0.0003991, whisper_loss=0.1271, over 23651.00 frames. ], tot_loss[loss=0.1224, beats_loss=0.01332, ecapa_loss=0.0004213, whisper_loss=0.1048, over 3909190.08 frames. ], batch size: 90, lr: 3.57e-02, grad_scale: 512.0 2024-08-09 16:37:06,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=92500.0, ans=0.125 2024-08-09 16:37:09,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=92500.0, ans=22.5 2024-08-09 16:37:16,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=92600.0, ans=0.04949747468305833 2024-08-09 16:37:24,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92600.0, ans=0.1 2024-08-09 16:37:31,604 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-09 16:37:35,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=92700.0, ans=0.125 2024-08-09 16:37:36,686 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-09 16:37:41,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=92800.0, ans=0.125 2024-08-09 16:37:42,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=92800.0, ans=0.09899494936611666 2024-08-09 16:38:07,082 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.318e+01 3.067e+01 3.450e+01 4.093e+01 6.352e+01, threshold=6.900e+01, percent-clipped=1.0 2024-08-09 16:38:07,109 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9300, loss[loss=0.1405, beats_loss=0.009368, ecapa_loss=0.0004419, whisper_loss=0.1267, over 17316.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.01331, ecapa_loss=0.0004179, whisper_loss=0.1051, over 3921881.11 frames. ], batch size: 67, lr: 3.57e-02, grad_scale: 512.0 2024-08-09 16:38:09,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93000.0, ans=0.1 2024-08-09 16:38:11,621 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 16:38:24,027 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 12 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 16:38:31,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=93100.0, ans=0.125 2024-08-09 16:38:32,310 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 16:38:35,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-08-09 16:38:42,014 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-09 16:38:52,752 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-09 16:38:56,962 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-09 16:39:01,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=93400.0, ans=0.0 2024-08-09 16:39:03,854 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 13 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 16:39:12,202 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-09 16:39:13,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=93400.0, ans=0.2 2024-08-09 16:39:15,990 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9350, loss[loss=0.1101, beats_loss=0.01361, ecapa_loss=0.0004981, whisper_loss=0.09155, over 19200.00 frames. ], tot_loss[loss=0.1221, beats_loss=0.0133, ecapa_loss=0.00042, whisper_loss=0.1046, over 3883025.65 frames. ], batch size: 80, lr: 3.56e-02, grad_scale: 512.0 2024-08-09 16:39:23,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=93500.0, ans=22.5 2024-08-09 16:39:26,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=93500.0, ans=0.2 2024-08-09 16:39:29,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=93600.0, ans=0.2 2024-08-09 16:39:30,310 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 16:39:37,067 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-09 16:39:42,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=93700.0, ans=0.05 2024-08-09 16:39:44,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=12.0 2024-08-09 16:39:48,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=93700.0, ans=0.0 2024-08-09 16:39:52,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=93700.0, ans=0.015 2024-08-09 16:39:53,167 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-09 16:39:58,264 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-09 16:40:24,163 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.20 vs. limit=10.0 2024-08-09 16:40:24,474 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 2.921e+01 3.226e+01 3.791e+01 1.210e+02, threshold=6.451e+01, percent-clipped=3.0 2024-08-09 16:40:24,495 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9400, loss[loss=0.1186, beats_loss=0.0123, ecapa_loss=0.000482, whisper_loss=0.1014, over 18842.00 frames. ], tot_loss[loss=0.1225, beats_loss=0.01318, ecapa_loss=0.0004233, whisper_loss=0.1051, over 3874276.80 frames. ], batch size: 78, lr: 3.55e-02, grad_scale: 512.0 2024-08-09 16:40:41,107 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 16:40:49,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=94100.0, ans=0.1 2024-08-09 16:40:58,397 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=28.90 vs. limit=22.5 2024-08-09 16:41:00,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=94200.0, ans=0.0 2024-08-09 16:41:11,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=94300.0, ans=0.05 2024-08-09 16:41:19,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=94400.0, ans=0.125 2024-08-09 16:41:25,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=94400.0, ans=0.125 2024-08-09 16:41:32,640 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9450, loss[loss=0.1137, beats_loss=0.01146, ecapa_loss=0.000405, whisper_loss=0.0982, over 15386.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.01319, ecapa_loss=0.0004209, whisper_loss=0.1052, over 3871306.36 frames. ], batch size: 61, lr: 3.55e-02, grad_scale: 512.0 2024-08-09 16:42:04,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=94700.0, ans=0.2 2024-08-09 16:42:08,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=18.70 vs. limit=15.0 2024-08-09 16:42:16,627 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2024-08-09 16:42:40,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.979e+01 3.573e+01 4.112e+01 7.498e+01, threshold=7.146e+01, percent-clipped=2.0 2024-08-09 16:42:40,508 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9500, loss[loss=0.1441, beats_loss=0.009927, ecapa_loss=0.0004557, whisper_loss=0.1297, over 20883.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.01328, ecapa_loss=0.0004182, whisper_loss=0.1048, over 3873087.06 frames. ], batch size: 79, lr: 3.54e-02, grad_scale: 512.0 2024-08-09 16:43:07,050 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 16:43:14,185 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.70 vs. limit=15.0 2024-08-09 16:43:14,990 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-09 16:43:21,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=95300.0, ans=0.0 2024-08-09 16:43:48,736 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9550, loss[loss=0.1187, beats_loss=0.01384, ecapa_loss=0.0004601, whisper_loss=0.1003, over 21328.00 frames. ], tot_loss[loss=0.1218, beats_loss=0.01326, ecapa_loss=0.0004202, whisper_loss=0.1043, over 3892011.35 frames. ], batch size: 89, lr: 3.54e-02, grad_scale: 512.0 2024-08-09 16:44:19,321 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-09 16:44:19,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=95700.0, ans=0.125 2024-08-09 16:44:22,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=95700.0, ans=0.0 2024-08-09 16:44:31,872 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-09 16:44:34,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=95800.0, ans=0.125 2024-08-09 16:44:51,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=95900.0, ans=0.0 2024-08-09 16:44:54,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=95900.0, ans=0.2 2024-08-09 16:44:56,683 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 3.093e+01 3.544e+01 4.156e+01 7.056e+01, threshold=7.088e+01, percent-clipped=0.0 2024-08-09 16:44:56,708 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9600, loss[loss=0.1197, beats_loss=0.01433, ecapa_loss=0.0004362, whisper_loss=0.101, over 19567.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.01309, ecapa_loss=0.0004249, whisper_loss=0.1053, over 3876422.67 frames. ], batch size: 80, lr: 3.53e-02, grad_scale: 512.0 2024-08-09 16:44:57,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=96000.0, ans=0.2 2024-08-09 16:44:59,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=96000.0, ans=0.05 2024-08-09 16:44:59,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=96000.0, ans=0.1 2024-08-09 16:45:15,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=96100.0, ans=0.125 2024-08-09 16:45:17,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.60 vs. limit=22.5 2024-08-09 16:45:31,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=96200.0, ans=0.125 2024-08-09 16:45:34,537 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 16:45:35,756 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-09 16:45:37,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=96300.0, ans=0.025 2024-08-09 16:46:04,509 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9650, loss[loss=0.1079, beats_loss=0.01409, ecapa_loss=0.0004156, whisper_loss=0.08968, over 15598.00 frames. ], tot_loss[loss=0.1221, beats_loss=0.01307, ecapa_loss=0.000423, whisper_loss=0.1048, over 3863757.97 frames. ], batch size: 64, lr: 3.53e-02, grad_scale: 512.0 2024-08-09 16:46:38,187 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-09 16:47:12,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.968e+01 3.449e+01 4.387e+01 7.611e+01, threshold=6.898e+01, percent-clipped=2.0 2024-08-09 16:47:12,636 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9700, loss[loss=0.122, beats_loss=0.01426, ecapa_loss=0.0003513, whisper_loss=0.1042, over 17782.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01301, ecapa_loss=0.0004223, whisper_loss=0.1047, over 3858715.97 frames. ], batch size: 65, lr: 3.52e-02, grad_scale: 512.0 2024-08-09 16:47:13,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=97000.0, ans=0.2 2024-08-09 16:47:18,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=97000.0, ans=0.125 2024-08-09 16:47:19,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=97000.0, ans=0.0 2024-08-09 16:47:23,391 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-09 16:47:36,352 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 25 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-09 16:47:37,715 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 16:47:37,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=97100.0, ans=0.1 2024-08-09 16:47:47,981 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 16:47:52,039 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 16:47:56,166 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 28 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-09 16:47:57,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=97300.0, ans=0.125 2024-08-09 16:47:57,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=97300.0, ans=0.125 2024-08-09 16:48:14,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=97400.0, ans=0.125 2024-08-09 16:48:17,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=97400.0, ans=0.0 2024-08-09 16:48:22,095 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9750, loss[loss=0.1356, beats_loss=0.01328, ecapa_loss=0.0004719, whisper_loss=0.1176, over 22649.00 frames. ], tot_loss[loss=0.1212, beats_loss=0.01314, ecapa_loss=0.0004201, whisper_loss=0.1039, over 3845663.81 frames. ], batch size: 91, lr: 3.51e-02, grad_scale: 512.0 2024-08-09 16:48:25,077 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-09 16:48:31,515 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.32 vs. limit=10.0 2024-08-09 16:48:34,734 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 16:48:51,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=97700.0, ans=0.1 2024-08-09 16:48:58,686 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 16:49:04,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=97800.0, ans=0.0 2024-08-09 16:49:22,984 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-09 16:49:30,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=98000.0, ans=0.125 2024-08-09 16:49:31,361 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.868e+01 3.333e+01 3.887e+01 7.337e+01, threshold=6.667e+01, percent-clipped=1.0 2024-08-09 16:49:31,381 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9800, loss[loss=0.1277, beats_loss=0.01317, ecapa_loss=0.0003557, whisper_loss=0.1109, over 22439.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01321, ecapa_loss=0.0004199, whisper_loss=0.104, over 3862186.12 frames. ], batch size: 89, lr: 3.51e-02, grad_scale: 512.0 2024-08-09 16:50:03,399 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 16:50:10,320 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=12.0 2024-08-09 16:50:15,475 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-09 16:50:20,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.15 vs. limit=22.5 2024-08-09 16:50:47,326 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9850, loss[loss=0.1123, beats_loss=0.01285, ecapa_loss=0.0004586, whisper_loss=0.09487, over 15656.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01316, ecapa_loss=0.0004185, whisper_loss=0.1043, over 3856141.22 frames. ], batch size: 65, lr: 3.50e-02, grad_scale: 512.0 2024-08-09 16:50:55,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=98500.0, ans=0.125 2024-08-09 16:50:55,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=98500.0, ans=0.1 2024-08-09 16:51:25,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=98700.0, ans=0.0 2024-08-09 16:51:28,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=98700.0, ans=0.125 2024-08-09 16:51:35,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=98700.0, ans=10.0 2024-08-09 16:51:51,276 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.73 vs. limit=22.5 2024-08-09 16:51:54,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=98900.0, ans=0.125 2024-08-09 16:52:11,594 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.990e+01 3.470e+01 4.121e+01 8.675e+01, threshold=6.939e+01, percent-clipped=3.0 2024-08-09 16:52:11,614 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9900, loss[loss=0.1265, beats_loss=0.01165, ecapa_loss=0.0004713, whisper_loss=0.1102, over 22474.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.0132, ecapa_loss=0.0004169, whisper_loss=0.1048, over 3881012.47 frames. ], batch size: 91, lr: 3.50e-02, grad_scale: 512.0 2024-08-09 16:52:21,363 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 31 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-09 16:52:30,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=99100.0, ans=0.125 2024-08-09 16:52:35,973 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-09 16:52:54,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=99200.0, ans=0.0 2024-08-09 16:53:15,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=99300.0, ans=0.0 2024-08-09 16:53:35,714 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 9950, loss[loss=0.1054, beats_loss=0.01528, ecapa_loss=0.0004172, whisper_loss=0.08594, over 21352.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.0132, ecapa_loss=0.0004183, whisper_loss=0.1045, over 3862987.96 frames. ], batch size: 88, lr: 3.49e-02, grad_scale: 512.0 2024-08-09 16:53:36,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=99500.0, ans=0.125 2024-08-09 16:53:45,056 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-09 16:53:47,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99500.0, ans=0.1 2024-08-09 16:53:52,692 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 16:53:53,217 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2024-08-09 16:54:01,073 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 16:54:07,667 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 16:54:12,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=99700.0, ans=0.125 2024-08-09 16:54:21,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=99700.0, ans=0.07 2024-08-09 16:54:32,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=99800.0, ans=0.125 2024-08-09 16:54:34,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=99800.0, ans=0.125 2024-08-09 16:54:38,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99900.0, ans=0.1 2024-08-09 16:54:53,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.893e+01 3.392e+01 3.870e+01 8.367e+01, threshold=6.783e+01, percent-clipped=1.0 2024-08-09 16:54:53,733 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10000, loss[loss=0.1278, beats_loss=0.01667, ecapa_loss=0.0003349, whisper_loss=0.1078, over 23259.00 frames. ], tot_loss[loss=0.122, beats_loss=0.01314, ecapa_loss=0.0004174, whisper_loss=0.1047, over 3840468.11 frames. ], batch size: 92, lr: 3.49e-02, grad_scale: 1024.0 2024-08-09 16:54:55,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=100000.0, ans=0.2 2024-08-09 16:54:59,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=100000.0, ans=0.0 2024-08-09 16:55:05,628 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=22.5 2024-08-09 16:55:05,911 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=3.90 vs. limit=8.0 2024-08-09 16:55:08,557 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2024-08-09 16:55:12,176 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-09 16:55:17,997 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 22 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-09 16:55:19,339 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 16:55:29,370 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-09 16:55:33,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=100200.0, ans=0.2 2024-08-09 16:55:42,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=100300.0, ans=0.1 2024-08-09 16:55:49,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=100400.0, ans=0.0 2024-08-09 16:56:03,782 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10050, loss[loss=0.1044, beats_loss=0.01461, ecapa_loss=0.0004606, whisper_loss=0.08523, over 18492.00 frames. ], tot_loss[loss=0.1213, beats_loss=0.01312, ecapa_loss=0.0004176, whisper_loss=0.104, over 3828883.21 frames. ], batch size: 80, lr: 3.48e-02, grad_scale: 1024.0 2024-08-09 16:56:05,958 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.68 vs. limit=15.0 2024-08-09 16:56:08,014 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2024-08-09 16:56:14,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=100500.0, ans=0.0 2024-08-09 16:56:32,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.16 vs. limit=15.0 2024-08-09 16:56:52,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=100800.0, ans=0.125 2024-08-09 16:57:12,744 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.921e+01 3.378e+01 4.111e+01 6.632e+01, threshold=6.756e+01, percent-clipped=0.0 2024-08-09 16:57:12,764 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10100, loss[loss=0.1085, beats_loss=0.01182, ecapa_loss=0.0004633, whisper_loss=0.09204, over 22192.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01314, ecapa_loss=0.0004161, whisper_loss=0.1044, over 3861724.13 frames. ], batch size: 92, lr: 3.47e-02, grad_scale: 1024.0 2024-08-09 16:57:29,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=101100.0, ans=0.04949747468305833 2024-08-09 16:57:33,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=101100.0, ans=0.0 2024-08-09 16:57:46,437 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.57 vs. limit=22.5 2024-08-09 16:58:02,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=101300.0, ans=0.07 2024-08-09 16:58:05,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=101300.0, ans=0.125 2024-08-09 16:58:14,069 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 16:58:14,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=101400.0, ans=0.125 2024-08-09 16:58:20,646 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10150, loss[loss=0.1302, beats_loss=0.01456, ecapa_loss=0.0003428, whisper_loss=0.1122, over 20785.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01304, ecapa_loss=0.000417, whisper_loss=0.1055, over 3883040.65 frames. ], batch size: 78, lr: 3.47e-02, grad_scale: 1024.0 2024-08-09 16:58:29,024 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-09 16:58:42,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=101600.0, ans=0.07 2024-08-09 16:58:47,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101700.0, ans=0.1 2024-08-09 16:59:08,273 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 16:59:09,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=101800.0, ans=0.125 2024-08-09 16:59:21,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=101900.0, ans=0.05 2024-08-09 16:59:28,721 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-09 16:59:29,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.923e+01 3.411e+01 4.089e+01 6.898e+01, threshold=6.822e+01, percent-clipped=2.0 2024-08-09 16:59:30,007 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10200, loss[loss=0.1187, beats_loss=0.01139, ecapa_loss=0.0004316, whisper_loss=0.103, over 20806.00 frames. ], tot_loss[loss=0.1221, beats_loss=0.01307, ecapa_loss=0.0004162, whisper_loss=0.1049, over 3866143.71 frames. ], batch size: 81, lr: 3.46e-02, grad_scale: 1024.0 2024-08-09 16:59:30,698 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=10.51 vs. limit=10.0 2024-08-09 16:59:32,984 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 16:59:43,054 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.51 vs. limit=15.0 2024-08-09 16:59:45,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=102100.0, ans=0.2 2024-08-09 16:59:53,764 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 17:00:00,873 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 19 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 17:00:05,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=102200.0, ans=0.125 2024-08-09 17:00:23,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=102400.0, ans=0.125 2024-08-09 17:00:26,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=102400.0, ans=0.125 2024-08-09 17:00:32,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=102400.0, ans=0.125 2024-08-09 17:00:38,834 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10250, loss[loss=0.1309, beats_loss=0.01361, ecapa_loss=0.0004084, whisper_loss=0.1132, over 22419.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01318, ecapa_loss=0.0004127, whisper_loss=0.1041, over 3868438.00 frames. ], batch size: 89, lr: 3.46e-02, grad_scale: 1024.0 2024-08-09 17:00:40,754 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=6.630e-01 2024-08-09 17:00:51,428 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 17:01:01,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=102600.0, ans=0.125 2024-08-09 17:01:05,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=102700.0, ans=0.125 2024-08-09 17:01:13,722 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-09 17:01:34,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=102900.0, ans=0.2 2024-08-09 17:01:47,222 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 2.938e+01 3.467e+01 4.292e+01 7.706e+01, threshold=6.934e+01, percent-clipped=1.0 2024-08-09 17:01:47,243 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10300, loss[loss=0.06254, beats_loss=0.01862, ecapa_loss=0.0002626, whisper_loss=0.04129, over 13890.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01323, ecapa_loss=0.0004113, whisper_loss=0.1042, over 3889487.78 frames. ], batch size: 55, lr: 3.45e-02, grad_scale: 1024.0 2024-08-09 17:01:48,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=103000.0, ans=0.125 2024-08-09 17:01:50,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=103000.0, ans=0.125 2024-08-09 17:01:58,298 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-09 17:02:00,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=103100.0, ans=0.125 2024-08-09 17:02:03,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=103100.0, ans=0.2 2024-08-09 17:02:08,027 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-09 17:02:08,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=103100.0, ans=0.125 2024-08-09 17:02:15,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=103200.0, ans=0.125 2024-08-09 17:02:24,172 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-09 17:02:40,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=103400.0, ans=0.125 2024-08-09 17:02:43,070 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 17:02:50,864 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-09 17:02:52,189 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 17:02:53,676 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-09 17:02:54,680 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10350, loss[loss=0.1436, beats_loss=0.01027, ecapa_loss=0.0003824, whisper_loss=0.1295, over 15881.00 frames. ], tot_loss[loss=0.1213, beats_loss=0.01326, ecapa_loss=0.0004121, whisper_loss=0.1039, over 3903541.89 frames. ], batch size: 59, lr: 3.45e-02, grad_scale: 1024.0 2024-08-09 17:02:56,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=103500.0, ans=0.0 2024-08-09 17:03:33,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=103700.0, ans=0.09899494936611666 2024-08-09 17:03:36,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=103800.0, ans=0.125 2024-08-09 17:03:38,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=103800.0, ans=0.0 2024-08-09 17:03:41,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=103800.0, ans=0.0 2024-08-09 17:03:45,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=103800.0, ans=0.2 2024-08-09 17:03:49,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=103900.0, ans=0.0 2024-08-09 17:04:01,538 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 17:04:02,863 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 3.016e+01 3.413e+01 4.405e+01 7.924e+01, threshold=6.827e+01, percent-clipped=1.0 2024-08-09 17:04:02,883 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10400, loss[loss=0.1129, beats_loss=0.01363, ecapa_loss=0.0004568, whisper_loss=0.09469, over 17805.00 frames. ], tot_loss[loss=0.1213, beats_loss=0.01332, ecapa_loss=0.0004079, whisper_loss=0.1039, over 3924705.66 frames. ], batch size: 73, lr: 3.44e-02, grad_scale: 1024.0 2024-08-09 17:04:24,272 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 17:04:38,135 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 17:04:43,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=104300.0, ans=0.0 2024-08-09 17:04:47,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=104300.0, ans=0.125 2024-08-09 17:04:50,209 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 17:04:50,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=104300.0, ans=0.1 2024-08-09 17:05:04,112 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-09 17:05:12,214 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10450, loss[loss=0.1065, beats_loss=0.0116, ecapa_loss=0.0004851, whisper_loss=0.09003, over 14275.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01337, ecapa_loss=0.0004085, whisper_loss=0.1029, over 3897198.90 frames. ], batch size: 59, lr: 3.44e-02, grad_scale: 1024.0 2024-08-09 17:05:14,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=104500.0, ans=0.0 2024-08-09 17:05:26,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=104600.0, ans=0.125 2024-08-09 17:05:34,583 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-09 17:05:39,486 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.13 vs. limit=15.0 2024-08-09 17:05:40,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=104700.0, ans=0.04949747468305833 2024-08-09 17:06:02,892 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.308e-01 2024-08-09 17:06:03,930 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 30 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-09 17:06:04,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=104800.0, ans=0.0 2024-08-09 17:06:15,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=104900.0, ans=0.0 2024-08-09 17:06:21,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=105000.0, ans=0.2 2024-08-09 17:06:22,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.355e+01 3.012e+01 3.451e+01 3.999e+01 6.423e+01, threshold=6.903e+01, percent-clipped=0.0 2024-08-09 17:06:22,510 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10500, loss[loss=0.1318, beats_loss=0.01096, ecapa_loss=0.0003894, whisper_loss=0.117, over 15867.00 frames. ], tot_loss[loss=0.1205, beats_loss=0.0133, ecapa_loss=0.000409, whisper_loss=0.1031, over 3873693.17 frames. ], batch size: 63, lr: 3.43e-02, grad_scale: 1024.0 2024-08-09 17:06:25,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=105000.0, ans=0.0 2024-08-09 17:06:26,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=105000.0, ans=0.125 2024-08-09 17:06:29,145 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2024-08-09 17:06:30,014 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 15 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 17:06:38,029 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 17:06:57,536 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-09 17:07:03,059 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 17:07:17,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=105400.0, ans=0.125 2024-08-09 17:07:18,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=105400.0, ans=0.035 2024-08-09 17:07:21,249 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-09 17:07:24,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=105400.0, ans=0.0 2024-08-09 17:07:31,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=105500.0, ans=0.0 2024-08-09 17:07:32,205 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10550, loss[loss=0.1144, beats_loss=0.01224, ecapa_loss=0.0004955, whisper_loss=0.09725, over 14968.00 frames. ], tot_loss[loss=0.1207, beats_loss=0.01324, ecapa_loss=0.00041, whisper_loss=0.1033, over 3872360.06 frames. ], batch size: 62, lr: 3.43e-02, grad_scale: 1024.0 2024-08-09 17:07:32,435 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-09 17:07:33,894 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 17:07:41,473 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.09 vs. limit=15.0 2024-08-09 17:07:42,155 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 17:07:46,057 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 17:07:58,952 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 35 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 17:08:02,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=105700.0, ans=0.125 2024-08-09 17:08:08,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=105700.0, ans=0.125 2024-08-09 17:08:27,805 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-09 17:08:35,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=105900.0, ans=0.5 2024-08-09 17:08:41,495 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.989e+01 3.482e+01 4.095e+01 9.318e+01, threshold=6.964e+01, percent-clipped=2.0 2024-08-09 17:08:41,516 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10600, loss[loss=0.1242, beats_loss=0.01366, ecapa_loss=0.000294, whisper_loss=0.1076, over 22731.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01329, ecapa_loss=0.0004079, whisper_loss=0.1029, over 3913391.15 frames. ], batch size: 87, lr: 3.42e-02, grad_scale: 1024.0 2024-08-09 17:08:55,349 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-09 17:09:05,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=106100.0, ans=0.2 2024-08-09 17:09:12,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=106200.0, ans=0.0 2024-08-09 17:09:14,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=106200.0, ans=0.125 2024-08-09 17:09:17,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=22.81 vs. limit=15.0 2024-08-09 17:09:34,075 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.37 vs. limit=22.5 2024-08-09 17:09:35,412 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.44 vs. limit=22.5 2024-08-09 17:09:46,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=106400.0, ans=0.05 2024-08-09 17:09:51,615 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10650, loss[loss=0.1258, beats_loss=0.01546, ecapa_loss=0.0003795, whisper_loss=0.1065, over 17208.00 frames. ], tot_loss[loss=0.1205, beats_loss=0.01326, ecapa_loss=0.000404, whisper_loss=0.1032, over 3904059.71 frames. ], batch size: 70, lr: 3.41e-02, grad_scale: 1024.0 2024-08-09 17:09:56,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=106500.0, ans=0.125 2024-08-09 17:10:03,240 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 17:10:09,150 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 17:10:09,518 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2024-08-09 17:10:11,647 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 17:10:15,652 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 17:10:32,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=106800.0, ans=0.125 2024-08-09 17:10:33,089 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.26 vs. limit=15.0 2024-08-09 17:10:45,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=106800.0, ans=0.125 2024-08-09 17:10:53,357 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 17:11:01,584 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 3.109e+01 3.454e+01 4.119e+01 5.374e+01, threshold=6.908e+01, percent-clipped=0.0 2024-08-09 17:11:01,612 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10700, loss[loss=0.1288, beats_loss=0.01563, ecapa_loss=0.0003157, whisper_loss=0.1101, over 23068.00 frames. ], tot_loss[loss=0.1211, beats_loss=0.01326, ecapa_loss=0.0004038, whisper_loss=0.1038, over 3932679.90 frames. ], batch size: 90, lr: 3.41e-02, grad_scale: 1024.0 2024-08-09 17:11:13,613 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=22.5 2024-08-09 17:11:43,489 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 17:12:10,250 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10750, loss[loss=0.1452, beats_loss=0.01193, ecapa_loss=0.0003968, whisper_loss=0.1293, over 22250.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01326, ecapa_loss=0.0004013, whisper_loss=0.1042, over 3941482.49 frames. ], batch size: 88, lr: 3.40e-02, grad_scale: 1024.0 2024-08-09 17:12:16,023 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-09 17:12:16,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=107500.0, ans=0.0 2024-08-09 17:12:20,672 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2024-08-09 17:12:31,897 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-09 17:12:40,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=107700.0, ans=0.125 2024-08-09 17:13:00,061 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.92 vs. limit=22.5 2024-08-09 17:13:03,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=107900.0, ans=0.125 2024-08-09 17:13:07,413 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-09 17:13:07,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=107900.0, ans=0.125 2024-08-09 17:13:12,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.38 vs. limit=10.0 2024-08-09 17:13:18,084 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.960e+01 3.558e+01 4.572e+01 9.073e+01, threshold=7.116e+01, percent-clipped=3.0 2024-08-09 17:13:18,111 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10800, loss[loss=0.1309, beats_loss=0.01292, ecapa_loss=0.0003865, whisper_loss=0.1141, over 22520.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01322, ecapa_loss=0.0004022, whisper_loss=0.1047, over 3949028.80 frames. ], batch size: 90, lr: 3.40e-02, grad_scale: 1024.0 2024-08-09 17:13:48,281 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-09 17:13:48,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=108200.0, ans=0.0 2024-08-09 17:13:52,834 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-09 17:13:57,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=108200.0, ans=0.2 2024-08-09 17:14:12,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=108400.0, ans=0.0 2024-08-09 17:14:25,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=108500.0, ans=0.025 2024-08-09 17:14:26,248 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10850, loss[loss=0.113, beats_loss=0.01367, ecapa_loss=0.0004219, whisper_loss=0.09512, over 16008.00 frames. ], tot_loss[loss=0.1223, beats_loss=0.0131, ecapa_loss=0.0004012, whisper_loss=0.1052, over 3934338.87 frames. ], batch size: 64, lr: 3.39e-02, grad_scale: 1024.0 2024-08-09 17:14:39,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=108600.0, ans=0.125 2024-08-09 17:14:51,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=108600.0, ans=22.5 2024-08-09 17:14:52,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=108600.0, ans=0.95 2024-08-09 17:15:17,099 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 17:15:18,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=108800.0, ans=0.1 2024-08-09 17:15:29,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=108900.0, ans=0.2 2024-08-09 17:15:30,165 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 17:15:35,483 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 3.154e+01 3.497e+01 4.138e+01 7.474e+01, threshold=6.993e+01, percent-clipped=1.0 2024-08-09 17:15:35,503 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10900, loss[loss=0.1317, beats_loss=0.01352, ecapa_loss=0.0004284, whisper_loss=0.1139, over 16059.00 frames. ], tot_loss[loss=0.123, beats_loss=0.01301, ecapa_loss=0.000403, whisper_loss=0.106, over 3933743.47 frames. ], batch size: 67, lr: 3.39e-02, grad_scale: 1024.0 2024-08-09 17:15:37,185 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-09 17:15:37,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=109000.0, ans=0.1 2024-08-09 17:15:41,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=109000.0, ans=0.125 2024-08-09 17:15:42,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=109000.0, ans=0.125 2024-08-09 17:15:49,340 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-09 17:16:03,220 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-09 17:16:08,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=109200.0, ans=0.0 2024-08-09 17:16:08,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=109200.0, ans=0.0 2024-08-09 17:16:18,113 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-09 17:16:30,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=109400.0, ans=0.07 2024-08-09 17:16:38,267 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-09 17:16:39,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=109400.0, ans=0.04949747468305833 2024-08-09 17:16:42,489 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-09 17:16:43,657 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 10950, loss[loss=0.12, beats_loss=0.01329, ecapa_loss=0.0003294, whisper_loss=0.1034, over 17379.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.01305, ecapa_loss=0.0003974, whisper_loss=0.1055, over 3922924.98 frames. ], batch size: 67, lr: 3.38e-02, grad_scale: 1024.0 2024-08-09 17:16:58,133 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 17:16:59,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=109600.0, ans=0.125 2024-08-09 17:17:05,318 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.05 vs. limit=22.5 2024-08-09 17:17:10,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=109700.0, ans=0.015 2024-08-09 17:17:18,868 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.61 vs. limit=10.0 2024-08-09 17:17:22,056 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 17:17:28,772 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-09 17:17:34,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=109800.0, ans=0.07 2024-08-09 17:17:50,254 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 17:17:51,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.947e+01 3.240e+01 3.931e+01 5.659e+01, threshold=6.481e+01, percent-clipped=0.0 2024-08-09 17:17:51,547 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11000, loss[loss=0.1374, beats_loss=0.0132, ecapa_loss=0.0004107, whisper_loss=0.1201, over 21748.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.0131, ecapa_loss=0.0003973, whisper_loss=0.1051, over 3947465.09 frames. ], batch size: 90, lr: 3.38e-02, grad_scale: 1024.0 2024-08-09 17:18:09,826 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-09 17:18:18,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=110200.0, ans=0.2 2024-08-09 17:18:27,656 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 16 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 17:18:30,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=110200.0, ans=0.0 2024-08-09 17:18:55,318 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-09 17:18:58,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=110400.0, ans=0.125 2024-08-09 17:18:59,433 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 17:19:00,735 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11050, loss[loss=0.1233, beats_loss=0.01225, ecapa_loss=0.0003705, whisper_loss=0.1074, over 20570.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.0131, ecapa_loss=0.0003971, whisper_loss=0.1046, over 3923231.47 frames. ], batch size: 80, lr: 3.37e-02, grad_scale: 1024.0 2024-08-09 17:19:46,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=110800.0, ans=0.0 2024-08-09 17:19:47,131 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-09 17:19:59,473 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-09 17:19:59,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=110900.0, ans=0.125 2024-08-09 17:20:05,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=110900.0, ans=0.09899494936611666 2024-08-09 17:20:09,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=111000.0, ans=0.0 2024-08-09 17:20:10,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+01 3.030e+01 3.567e+01 4.272e+01 6.137e+01, threshold=7.134e+01, percent-clipped=0.0 2024-08-09 17:20:10,365 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11100, loss[loss=0.135, beats_loss=0.01106, ecapa_loss=0.0004794, whisper_loss=0.1192, over 22565.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01312, ecapa_loss=0.0004013, whisper_loss=0.1043, over 3919036.16 frames. ], batch size: 94, lr: 3.37e-02, grad_scale: 1024.0 2024-08-09 17:20:18,707 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-09 17:20:20,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=111000.0, ans=0.125 2024-08-09 17:20:24,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111100.0, ans=0.1 2024-08-09 17:20:26,007 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 17:20:31,853 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=15.0 2024-08-09 17:21:17,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=111400.0, ans=0.125 2024-08-09 17:21:19,229 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11150, loss[loss=0.1338, beats_loss=0.01096, ecapa_loss=0.0003998, whisper_loss=0.1189, over 20998.00 frames. ], tot_loss[loss=0.121, beats_loss=0.0131, ecapa_loss=0.0003992, whisper_loss=0.1039, over 3925699.48 frames. ], batch size: 85, lr: 3.36e-02, grad_scale: 1024.0 2024-08-09 17:21:25,501 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2024-08-09 17:21:26,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111500.0, ans=0.1 2024-08-09 17:21:29,153 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 17:21:31,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=111500.0, ans=0.025 2024-08-09 17:21:36,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=111600.0, ans=0.2 2024-08-09 17:21:43,013 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 35 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-09 17:21:57,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=111700.0, ans=0.035 2024-08-09 17:22:06,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=111800.0, ans=0.125 2024-08-09 17:22:06,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=111800.0, ans=0.125 2024-08-09 17:22:11,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=111800.0, ans=0.1 2024-08-09 17:22:22,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=111900.0, ans=0.125 2024-08-09 17:22:28,498 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.940e+01 3.532e+01 4.042e+01 6.455e+01, threshold=7.065e+01, percent-clipped=0.0 2024-08-09 17:22:28,518 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11200, loss[loss=0.1489, beats_loss=0.01074, ecapa_loss=0.0004416, whisper_loss=0.1337, over 13894.00 frames. ], tot_loss[loss=0.1211, beats_loss=0.01314, ecapa_loss=0.0003981, whisper_loss=0.1039, over 3933262.38 frames. ], batch size: 53, lr: 3.36e-02, grad_scale: 1024.0 2024-08-09 17:22:35,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=112000.0, ans=0.0 2024-08-09 17:22:41,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=112100.0, ans=0.125 2024-08-09 17:22:42,651 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 17:22:47,989 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-09 17:23:00,839 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 17:23:02,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=112200.0, ans=0.125 2024-08-09 17:23:06,489 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 17:23:09,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=112300.0, ans=0.1 2024-08-09 17:23:17,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=112300.0, ans=0.125 2024-08-09 17:23:34,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=112400.0, ans=0.2 2024-08-09 17:23:36,538 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 31 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-09 17:23:37,742 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11250, loss[loss=0.1476, beats_loss=0.007826, ecapa_loss=0.0004884, whisper_loss=0.1349, over 18130.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01311, ecapa_loss=0.0004001, whisper_loss=0.1047, over 3937341.04 frames. ], batch size: 73, lr: 3.35e-02, grad_scale: 1024.0 2024-08-09 17:23:52,721 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2024-08-09 17:24:16,265 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.876e-01 2024-08-09 17:24:21,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=112800.0, ans=0.125 2024-08-09 17:24:35,252 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-09 17:24:45,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=112900.0, ans=0.125 2024-08-09 17:24:47,207 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.986e+01 3.509e+01 4.225e+01 7.875e+01, threshold=7.019e+01, percent-clipped=1.0 2024-08-09 17:24:47,228 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11300, loss[loss=0.119, beats_loss=0.01324, ecapa_loss=0.0003523, whisper_loss=0.1022, over 16353.00 frames. ], tot_loss[loss=0.1213, beats_loss=0.01311, ecapa_loss=0.0003979, whisper_loss=0.1042, over 3923578.77 frames. ], batch size: 63, lr: 3.35e-02, grad_scale: 1024.0 2024-08-09 17:25:00,353 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-09 17:25:12,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=113100.0, ans=0.125 2024-08-09 17:25:20,567 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-09 17:25:23,057 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-09 17:25:43,954 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.17 vs. limit=22.5 2024-08-09 17:25:46,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=113400.0, ans=0.125 2024-08-09 17:25:49,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113400.0, ans=0.1 2024-08-09 17:25:56,707 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11350, loss[loss=0.1052, beats_loss=0.0129, ecapa_loss=0.0004174, whisper_loss=0.0881, over 13972.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01298, ecapa_loss=0.0003974, whisper_loss=0.1047, over 3941736.69 frames. ], batch size: 59, lr: 3.34e-02, grad_scale: 1024.0 2024-08-09 17:25:58,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=113500.0, ans=0.0 2024-08-09 17:26:06,142 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2024-08-09 17:26:08,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113500.0, ans=0.1 2024-08-09 17:26:09,349 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-09 17:26:11,432 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=22.5 2024-08-09 17:26:13,412 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 30 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 17:26:30,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=113700.0, ans=0.125 2024-08-09 17:26:31,011 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.42 vs. limit=22.5 2024-08-09 17:26:39,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=113800.0, ans=0.125 2024-08-09 17:27:06,455 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.900e+01 3.368e+01 4.036e+01 6.013e+01, threshold=6.736e+01, percent-clipped=0.0 2024-08-09 17:27:06,484 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11400, loss[loss=0.1249, beats_loss=0.01081, ecapa_loss=0.0004114, whisper_loss=0.11, over 17909.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01299, ecapa_loss=0.000399, whisper_loss=0.1046, over 3894847.89 frames. ], batch size: 71, lr: 3.34e-02, grad_scale: 1024.0 2024-08-09 17:27:19,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=114100.0, ans=0.0 2024-08-09 17:27:21,674 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-09 17:27:22,796 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 17:27:46,736 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-09 17:27:52,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=114300.0, ans=0.125 2024-08-09 17:27:57,269 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.19 vs. limit=8.0 2024-08-09 17:28:15,891 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11450, loss[loss=0.1181, beats_loss=0.01269, ecapa_loss=0.0004808, whisper_loss=0.1006, over 14839.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01309, ecapa_loss=0.0003978, whisper_loss=0.1043, over 3908513.51 frames. ], batch size: 62, lr: 3.33e-02, grad_scale: 1024.0 2024-08-09 17:28:19,008 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 17:28:42,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=114600.0, ans=0.125 2024-08-09 17:28:43,812 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2024-08-09 17:28:47,512 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-09 17:29:02,209 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.601e+00 2024-08-09 17:29:11,985 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-09 17:29:16,002 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-09 17:29:23,775 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=15.0 2024-08-09 17:29:26,993 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 3.054e+01 3.515e+01 4.307e+01 8.084e+01, threshold=7.029e+01, percent-clipped=1.0 2024-08-09 17:29:27,015 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11500, loss[loss=0.1071, beats_loss=0.01681, ecapa_loss=0.0002196, whisper_loss=0.08813, over 22011.00 frames. ], tot_loss[loss=0.1211, beats_loss=0.01314, ecapa_loss=0.0003967, whisper_loss=0.104, over 3881650.19 frames. ], batch size: 83, lr: 3.33e-02, grad_scale: 1024.0 2024-08-09 17:29:39,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=115000.0, ans=0.125 2024-08-09 17:29:53,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=115100.0, ans=0.125 2024-08-09 17:30:03,703 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.89 vs. limit=22.5 2024-08-09 17:30:14,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=115300.0, ans=0.2 2024-08-09 17:30:15,562 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 17:30:17,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=115300.0, ans=0.125 2024-08-09 17:30:41,370 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11550, loss[loss=0.1407, beats_loss=0.01362, ecapa_loss=0.0003733, whisper_loss=0.1233, over 23064.00 frames. ], tot_loss[loss=0.1211, beats_loss=0.01309, ecapa_loss=0.0003982, whisper_loss=0.104, over 3884427.23 frames. ], batch size: 93, lr: 3.32e-02, grad_scale: 1024.0 2024-08-09 17:31:29,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115800.0, ans=0.1 2024-08-09 17:31:35,864 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 17:31:43,414 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 17:31:53,535 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.957e+01 3.409e+01 3.917e+01 8.485e+01, threshold=6.817e+01, percent-clipped=1.0 2024-08-09 17:31:53,557 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11600, loss[loss=0.1102, beats_loss=0.01224, ecapa_loss=0.0003965, whisper_loss=0.09402, over 23090.00 frames. ], tot_loss[loss=0.1209, beats_loss=0.01308, ecapa_loss=0.0003981, whisper_loss=0.1038, over 3898435.66 frames. ], batch size: 92, lr: 3.32e-02, grad_scale: 1024.0 2024-08-09 17:31:56,328 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 17:32:10,725 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-09 17:32:16,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=116100.0, ans=0.0 2024-08-09 17:32:21,378 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-09 17:32:44,682 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-09 17:32:46,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=116300.0, ans=0.0 2024-08-09 17:32:48,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116300.0, ans=0.1 2024-08-09 17:32:53,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=116400.0, ans=0.2 2024-08-09 17:32:56,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116400.0, ans=0.1 2024-08-09 17:32:56,900 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-09 17:33:00,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.60 vs. limit=10.0 2024-08-09 17:33:07,015 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11650, loss[loss=0.08348, beats_loss=0.01688, ecapa_loss=0.0003752, whisper_loss=0.06285, over 14944.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01308, ecapa_loss=0.0003971, whisper_loss=0.1038, over 3933102.99 frames. ], batch size: 62, lr: 3.31e-02, grad_scale: 1024.0 2024-08-09 17:33:07,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.89 vs. limit=22.5 2024-08-09 17:33:10,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=116500.0, ans=0.0 2024-08-09 17:33:13,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=116500.0, ans=0.125 2024-08-09 17:33:22,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=116600.0, ans=0.125 2024-08-09 17:33:38,415 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 17:33:46,432 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.06 vs. limit=15.0 2024-08-09 17:33:51,125 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 17:33:53,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=116800.0, ans=0.2 2024-08-09 17:34:05,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=116900.0, ans=0.2 2024-08-09 17:34:10,604 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 17:34:14,865 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 17:34:18,574 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 3.106e+01 3.561e+01 4.217e+01 8.775e+01, threshold=7.122e+01, percent-clipped=2.0 2024-08-09 17:34:18,601 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11700, loss[loss=0.1118, beats_loss=0.01167, ecapa_loss=0.0003978, whisper_loss=0.09614, over 16430.00 frames. ], tot_loss[loss=0.1209, beats_loss=0.0131, ecapa_loss=0.0003935, whisper_loss=0.1039, over 3927621.86 frames. ], batch size: 67, lr: 3.31e-02, grad_scale: 1024.0 2024-08-09 17:34:35,745 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 23 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-09 17:34:47,370 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-09 17:35:05,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=117300.0, ans=0.125 2024-08-09 17:35:21,529 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=9.321e-01 2024-08-09 17:35:22,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=117400.0, ans=0.125 2024-08-09 17:35:30,743 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11750, loss[loss=0.1422, beats_loss=0.01131, ecapa_loss=0.0003581, whisper_loss=0.1273, over 15677.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01317, ecapa_loss=0.0003914, whisper_loss=0.1038, over 3906257.17 frames. ], batch size: 56, lr: 3.30e-02, grad_scale: 1024.0 2024-08-09 17:35:31,615 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=15.0 2024-08-09 17:35:45,341 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.75 vs. limit=12.0 2024-08-09 17:36:01,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=117700.0, ans=0.125 2024-08-09 17:36:06,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=117700.0, ans=0.1 2024-08-09 17:36:10,053 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 17:36:18,338 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 17:36:19,497 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 17:36:40,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.943e+01 3.344e+01 4.022e+01 9.659e+01, threshold=6.689e+01, percent-clipped=1.0 2024-08-09 17:36:40,390 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11800, loss[loss=0.1399, beats_loss=0.009616, ecapa_loss=0.0005274, whisper_loss=0.1251, over 15885.00 frames. ], tot_loss[loss=0.1212, beats_loss=0.01308, ecapa_loss=0.0003934, whisper_loss=0.1042, over 3875531.31 frames. ], batch size: 65, lr: 3.30e-02, grad_scale: 1024.0 2024-08-09 17:36:42,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=118000.0, ans=0.125 2024-08-09 17:36:58,142 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-09 17:37:30,948 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 17:37:37,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=118400.0, ans=0.0 2024-08-09 17:37:43,206 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 15 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-09 17:37:47,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=118400.0, ans=0.0 2024-08-09 17:37:51,483 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11850, loss[loss=0.1434, beats_loss=0.01224, ecapa_loss=0.0003829, whisper_loss=0.1273, over 23233.00 frames. ], tot_loss[loss=0.1211, beats_loss=0.01311, ecapa_loss=0.0003906, whisper_loss=0.1041, over 3916229.79 frames. ], batch size: 92, lr: 3.29e-02, grad_scale: 1024.0 2024-08-09 17:38:00,272 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 17:38:26,117 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.37 vs. limit=15.0 2024-08-09 17:38:51,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=118900.0, ans=0.125 2024-08-09 17:38:52,814 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 17:39:01,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=118900.0, ans=0.07 2024-08-09 17:39:03,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.477e+01 2.937e+01 3.452e+01 4.190e+01 6.711e+01, threshold=6.904e+01, percent-clipped=1.0 2024-08-09 17:39:03,687 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11900, loss[loss=0.1277, beats_loss=0.01179, ecapa_loss=0.000421, whisper_loss=0.1117, over 21777.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01315, ecapa_loss=0.000389, whisper_loss=0.1035, over 3947786.80 frames. ], batch size: 87, lr: 3.29e-02, grad_scale: 1024.0 2024-08-09 17:39:03,825 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 17:39:13,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.76 vs. limit=15.0 2024-08-09 17:39:23,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=119100.0, ans=0.125 2024-08-09 17:39:37,968 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-09 17:39:48,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=119300.0, ans=0.125 2024-08-09 17:39:49,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.27 vs. limit=10.0 2024-08-09 17:39:51,015 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-09 17:40:00,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=119300.0, ans=0.05 2024-08-09 17:40:17,606 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 11950, loss[loss=0.119, beats_loss=0.01374, ecapa_loss=0.0003882, whisper_loss=0.1014, over 21385.00 frames. ], tot_loss[loss=0.1202, beats_loss=0.01312, ecapa_loss=0.0003924, whisper_loss=0.1031, over 3912685.38 frames. ], batch size: 86, lr: 3.28e-02, grad_scale: 1024.0 2024-08-09 17:40:23,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119500.0, ans=0.1 2024-08-09 17:40:48,701 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2024-08-09 17:41:25,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=119900.0, ans=0.0 2024-08-09 17:41:32,898 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 25 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-09 17:41:36,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=120000.0, ans=10.0 2024-08-09 17:41:36,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.930e+01 3.462e+01 4.384e+01 7.473e+01, threshold=6.925e+01, percent-clipped=1.0 2024-08-09 17:41:36,941 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12000, loss[loss=0.1563, beats_loss=0.01097, ecapa_loss=0.000382, whisper_loss=0.1416, over 16204.00 frames. ], tot_loss[loss=0.1207, beats_loss=0.01306, ecapa_loss=0.0003913, whisper_loss=0.1038, over 3875950.75 frames. ], batch size: 61, lr: 3.28e-02, grad_scale: 2048.0 2024-08-09 17:41:36,941 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-09 17:42:24,889 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on ASR_libri: loss=0.2866, beats_loss=0, ecapa_loss=0.00111, whisper_loss=0.2755, over 922467.00 frames. 2024-08-09 17:42:44,863 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on SV_voxceleb1: loss=0.01049, beats_loss=0, ecapa_loss=0.001049, whisper_loss=0, over 939242.00 frames. 2024-08-09 17:44:38,292 INFO [train_multi_KD3.py:1149] (1/4) Epoch 1, validation on AT_audioset: loss=0.03131, beats_loss=0.03131, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 17:44:38,295 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-09 17:44:38,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=120000.0, ans=0.0 2024-08-09 17:44:49,358 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 17:44:49,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=120000.0, ans=0.0 2024-08-09 17:44:59,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=120100.0, ans=0.125 2024-08-09 17:45:12,140 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2024-08-09 17:45:17,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=120200.0, ans=0.0 2024-08-09 17:45:57,307 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12050, loss[loss=0.1425, beats_loss=0.009464, ecapa_loss=0.0004638, whisper_loss=0.1284, over 18397.00 frames. ], tot_loss[loss=0.1202, beats_loss=0.0132, ecapa_loss=0.0003884, whisper_loss=0.1031, over 3849667.72 frames. ], batch size: 71, lr: 3.27e-02, grad_scale: 2048.0 2024-08-09 17:46:14,675 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.67 vs. limit=15.0 2024-08-09 17:46:18,741 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 17:46:39,324 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-09 17:46:48,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-08-09 17:47:11,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=121000.0, ans=0.1 2024-08-09 17:47:12,197 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.990e+01 3.554e+01 4.139e+01 7.218e+01, threshold=7.107e+01, percent-clipped=1.0 2024-08-09 17:47:12,217 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12100, loss[loss=0.1263, beats_loss=0.01382, ecapa_loss=0.0003286, whisper_loss=0.1092, over 16558.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01322, ecapa_loss=0.0003874, whisper_loss=0.1037, over 3861867.56 frames. ], batch size: 62, lr: 3.27e-02, grad_scale: 2048.0 2024-08-09 17:47:12,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=121000.0, ans=0.0 2024-08-09 17:47:16,152 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.97 vs. limit=22.5 2024-08-09 17:47:18,081 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-09 17:47:47,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=121200.0, ans=0.125 2024-08-09 17:47:55,496 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 17:48:00,971 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 17:48:02,133 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-09 17:48:06,842 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 17:48:17,558 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.12 vs. limit=10.0 2024-08-09 17:48:29,409 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12150, loss[loss=0.1637, beats_loss=0.01015, ecapa_loss=0.0004022, whisper_loss=0.1495, over 23366.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01313, ecapa_loss=0.0003905, whisper_loss=0.1045, over 3866737.88 frames. ], batch size: 87, lr: 3.26e-02, grad_scale: 2048.0 2024-08-09 17:49:05,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=121700.0, ans=0.0 2024-08-09 17:49:09,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=121700.0, ans=0.0 2024-08-09 17:49:18,769 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.58 vs. limit=15.0 2024-08-09 17:49:27,497 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 23 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-09 17:49:27,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=121800.0, ans=0.125 2024-08-09 17:49:29,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=121900.0, ans=0.0 2024-08-09 17:49:34,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=121900.0, ans=0.125 2024-08-09 17:49:45,981 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.869e+01 3.277e+01 4.136e+01 6.270e+01, threshold=6.555e+01, percent-clipped=0.0 2024-08-09 17:49:46,002 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12200, loss[loss=0.1267, beats_loss=0.01152, ecapa_loss=0.0004244, whisper_loss=0.1109, over 14777.00 frames. ], tot_loss[loss=0.1218, beats_loss=0.01308, ecapa_loss=0.0003912, whisper_loss=0.1048, over 3890385.11 frames. ], batch size: 61, lr: 3.26e-02, grad_scale: 2048.0 2024-08-09 17:49:55,185 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.84 vs. limit=15.0 2024-08-09 17:50:16,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=122200.0, ans=0.09899494936611666 2024-08-09 17:50:24,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=122200.0, ans=0.035 2024-08-09 17:50:29,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=122200.0, ans=0.1 2024-08-09 17:51:01,659 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12250, loss[loss=0.1441, beats_loss=0.01179, ecapa_loss=0.0003499, whisper_loss=0.1288, over 21034.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01299, ecapa_loss=0.0003893, whisper_loss=0.1051, over 3872443.32 frames. ], batch size: 83, lr: 3.25e-02, grad_scale: 2048.0 2024-08-09 17:51:11,819 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.960e+00 2024-08-09 17:51:14,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=122500.0, ans=0.1 2024-08-09 17:51:29,041 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2024-08-09 17:51:38,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=122700.0, ans=0.1 2024-08-09 17:52:08,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=122900.0, ans=0.0 2024-08-09 17:52:10,952 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 32 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 17:52:17,129 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.334e+01 2.887e+01 3.272e+01 4.030e+01 7.099e+01, threshold=6.544e+01, percent-clipped=1.0 2024-08-09 17:52:17,151 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12300, loss[loss=0.1166, beats_loss=0.01385, ecapa_loss=0.0003239, whisper_loss=0.09955, over 21551.00 frames. ], tot_loss[loss=0.1213, beats_loss=0.01307, ecapa_loss=0.0003897, whisper_loss=0.1043, over 3901722.98 frames. ], batch size: 88, lr: 3.25e-02, grad_scale: 2048.0 2024-08-09 17:52:27,043 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 17:52:28,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=123000.0, ans=0.125 2024-08-09 17:52:39,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=123100.0, ans=0.125 2024-08-09 17:52:52,253 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 17:52:58,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=123200.0, ans=0.125 2024-08-09 17:53:01,619 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 17:53:20,952 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=12.0 2024-08-09 17:53:31,799 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12350, loss[loss=0.1428, beats_loss=0.01149, ecapa_loss=0.0004089, whisper_loss=0.1272, over 22342.00 frames. ], tot_loss[loss=0.1213, beats_loss=0.01295, ecapa_loss=0.0003909, whisper_loss=0.1044, over 3910276.45 frames. ], batch size: 88, lr: 3.24e-02, grad_scale: 2048.0 2024-08-09 17:53:32,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=123500.0, ans=0.2 2024-08-09 17:53:34,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=123500.0, ans=0.125 2024-08-09 17:53:36,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=123500.0, ans=10.0 2024-08-09 17:54:09,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=123700.0, ans=0.0 2024-08-09 17:54:11,234 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-08-09 17:54:18,780 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 17:54:31,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=123900.0, ans=0.0 2024-08-09 17:54:48,380 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 3.013e+01 3.404e+01 4.023e+01 7.879e+01, threshold=6.808e+01, percent-clipped=3.0 2024-08-09 17:54:48,405 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12400, loss[loss=0.1328, beats_loss=0.01397, ecapa_loss=0.0003197, whisper_loss=0.1156, over 19628.00 frames. ], tot_loss[loss=0.1205, beats_loss=0.013, ecapa_loss=0.0003913, whisper_loss=0.1036, over 3882595.40 frames. ], batch size: 76, lr: 3.24e-02, grad_scale: 2048.0 2024-08-09 17:54:54,207 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-09 17:55:09,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=124100.0, ans=0.0 2024-08-09 17:55:24,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=124200.0, ans=0.125 2024-08-09 17:55:41,595 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-09 17:55:45,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=124400.0, ans=0.0 2024-08-09 17:55:58,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.26 vs. limit=10.0 2024-08-09 17:56:00,990 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12450, loss[loss=0.09071, beats_loss=0.01066, ecapa_loss=0.0004463, whisper_loss=0.07558, over 15451.00 frames. ], tot_loss[loss=0.1207, beats_loss=0.01305, ecapa_loss=0.000388, whisper_loss=0.1038, over 3940955.60 frames. ], batch size: 60, lr: 3.23e-02, grad_scale: 2048.0 2024-08-09 17:56:06,247 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 17:56:07,597 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 17:56:07,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=124500.0, ans=0.125 2024-08-09 17:56:07,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=124500.0, ans=0.0 2024-08-09 17:56:08,920 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 17:56:13,322 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 17:56:42,646 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 20 from LS+wenet, 22 from Vox, 51 fro AS 2024-08-09 17:57:05,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=124900.0, ans=0.125 2024-08-09 17:57:14,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 2.994e+01 3.498e+01 4.030e+01 6.153e+01, threshold=6.996e+01, percent-clipped=0.0 2024-08-09 17:57:14,307 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12500, loss[loss=0.1168, beats_loss=0.01279, ecapa_loss=0.0003979, whisper_loss=0.1, over 21919.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01313, ecapa_loss=0.0003856, whisper_loss=0.1034, over 3941518.84 frames. ], batch size: 89, lr: 3.23e-02, grad_scale: 2048.0 2024-08-09 17:57:16,587 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2024-08-09 17:57:18,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=125000.0, ans=0.125 2024-08-09 17:57:21,825 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=12.0 2024-08-09 17:57:22,558 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 15 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 17:57:30,458 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 34 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 17:57:34,534 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2024-08-09 17:57:42,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=125200.0, ans=0.125 2024-08-09 17:58:02,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=125300.0, ans=0.1 2024-08-09 17:58:03,814 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 17:58:25,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=125400.0, ans=0.07 2024-08-09 17:58:28,871 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12550, loss[loss=0.1259, beats_loss=0.01418, ecapa_loss=0.0004348, whisper_loss=0.1074, over 21426.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.01315, ecapa_loss=0.000386, whisper_loss=0.1038, over 3971795.03 frames. ], batch size: 90, lr: 3.22e-02, grad_scale: 2048.0 2024-08-09 17:58:37,208 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-09 17:58:44,699 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 17:58:52,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=125600.0, ans=0.125 2024-08-09 17:58:59,176 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2024-08-09 17:59:13,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=125800.0, ans=0.125 2024-08-09 17:59:43,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 3.067e+01 3.520e+01 4.433e+01 6.633e+01, threshold=7.039e+01, percent-clipped=0.0 2024-08-09 17:59:43,305 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12600, loss[loss=0.1331, beats_loss=0.01348, ecapa_loss=0.00035, whisper_loss=0.1162, over 22545.00 frames. ], tot_loss[loss=0.121, beats_loss=0.01318, ecapa_loss=0.0003858, whisper_loss=0.104, over 3973482.14 frames. ], batch size: 89, lr: 3.22e-02, grad_scale: 2048.0 2024-08-09 17:59:50,104 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2024-08-09 17:59:54,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.89 vs. limit=12.0 2024-08-09 17:59:54,856 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-09 17:59:57,619 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-09 17:59:59,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=126100.0, ans=0.125 2024-08-09 18:00:11,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=126200.0, ans=0.125 2024-08-09 18:00:13,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=126200.0, ans=0.125 2024-08-09 18:00:13,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=126200.0, ans=0.125 2024-08-09 18:00:16,771 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 18:00:25,936 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 18:00:46,867 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2024-08-09 18:00:50,047 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-09 18:00:55,655 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12650, loss[loss=0.1158, beats_loss=0.01533, ecapa_loss=0.000302, whisper_loss=0.09741, over 23276.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01331, ecapa_loss=0.0003845, whisper_loss=0.1029, over 3964605.75 frames. ], batch size: 92, lr: 3.21e-02, grad_scale: 2048.0 2024-08-09 18:00:57,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=126500.0, ans=0.125 2024-08-09 18:01:21,201 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-09 18:01:39,787 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 18:01:53,090 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 18:02:05,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=126900.0, ans=0.125 2024-08-09 18:02:05,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=126900.0, ans=0.125 2024-08-09 18:02:08,587 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.930e+01 3.194e+01 3.853e+01 8.153e+01, threshold=6.388e+01, percent-clipped=1.0 2024-08-09 18:02:08,608 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12700, loss[loss=0.116, beats_loss=0.0135, ecapa_loss=0.0003262, whisper_loss=0.09927, over 14955.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01328, ecapa_loss=0.0003806, whisper_loss=0.1025, over 3950753.42 frames. ], batch size: 56, lr: 3.21e-02, grad_scale: 2048.0 2024-08-09 18:02:23,072 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 18:02:35,239 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 18:02:46,173 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=15.0 2024-08-09 18:02:55,938 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.14 vs. limit=22.5 2024-08-09 18:03:01,020 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-09 18:03:10,735 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-09 18:03:22,559 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12750, loss[loss=0.09947, beats_loss=0.01521, ecapa_loss=0.0003904, whisper_loss=0.08036, over 16916.00 frames. ], tot_loss[loss=0.1193, beats_loss=0.01333, ecapa_loss=0.0003786, whisper_loss=0.1022, over 3917069.10 frames. ], batch size: 70, lr: 3.20e-02, grad_scale: 2048.0 2024-08-09 18:03:24,828 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=15.0 2024-08-09 18:03:31,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=127500.0, ans=0.125 2024-08-09 18:03:45,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=127600.0, ans=0.5 2024-08-09 18:04:00,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=127700.0, ans=0.0 2024-08-09 18:04:19,886 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 37 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 18:04:33,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 3.049e+01 3.510e+01 3.985e+01 5.812e+01, threshold=7.020e+01, percent-clipped=0.0 2024-08-09 18:04:33,374 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12800, loss[loss=0.1207, beats_loss=0.01432, ecapa_loss=0.000452, whisper_loss=0.1018, over 21221.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01319, ecapa_loss=0.0003823, whisper_loss=0.1033, over 3934652.59 frames. ], batch size: 88, lr: 3.20e-02, grad_scale: 2048.0 2024-08-09 18:04:40,421 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 18:04:45,906 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=9.213e-02 2024-08-09 18:04:50,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128100.0, ans=0.1 2024-08-09 18:04:53,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=128100.0, ans=0.0 2024-08-09 18:04:53,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.29 vs. limit=10.0 2024-08-09 18:04:54,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=128100.0, ans=0.0 2024-08-09 18:04:57,015 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 18:05:03,725 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-09 18:05:05,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=128200.0, ans=0.025 2024-08-09 18:05:08,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=128200.0, ans=0.2 2024-08-09 18:05:10,649 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 21 from LS+wenet, 13 from Vox, 58 fro AS 2024-08-09 18:05:19,707 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2024-08-09 18:05:20,685 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-09 18:05:44,006 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12850, loss[loss=0.1451, beats_loss=0.01106, ecapa_loss=0.0003866, whisper_loss=0.1302, over 23110.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01328, ecapa_loss=0.0003812, whisper_loss=0.1028, over 3891219.55 frames. ], batch size: 88, lr: 3.19e-02, grad_scale: 2048.0 2024-08-09 18:05:45,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=128500.0, ans=0.0 2024-08-09 18:05:53,531 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 18:05:56,254 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-09 18:06:10,246 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=12.0 2024-08-09 18:06:14,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=128700.0, ans=0.1 2024-08-09 18:06:50,473 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 18:06:53,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=128900.0, ans=0.125 2024-08-09 18:06:57,070 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.723e+01 3.295e+01 4.012e+01 6.106e+01, threshold=6.589e+01, percent-clipped=0.0 2024-08-09 18:06:57,091 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12900, loss[loss=0.1415, beats_loss=0.01292, ecapa_loss=0.0003913, whisper_loss=0.1246, over 20195.00 frames. ], tot_loss[loss=0.1192, beats_loss=0.01334, ecapa_loss=0.0003817, whisper_loss=0.1021, over 3877919.66 frames. ], batch size: 82, lr: 3.19e-02, grad_scale: 2048.0 2024-08-09 18:06:57,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=129000.0, ans=0.125 2024-08-09 18:07:16,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=129100.0, ans=0.0 2024-08-09 18:07:16,484 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=6.0 2024-08-09 18:07:23,360 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2024-08-09 18:07:26,297 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.04 vs. limit=10.0 2024-08-09 18:07:30,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=129200.0, ans=0.125 2024-08-09 18:07:31,637 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 18:07:42,194 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-09 18:08:00,772 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=15.0 2024-08-09 18:08:06,669 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=15.0 2024-08-09 18:08:08,636 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 12950, loss[loss=0.1202, beats_loss=0.01533, ecapa_loss=0.0003548, whisper_loss=0.1013, over 21433.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01322, ecapa_loss=0.0003801, whisper_loss=0.1029, over 3884213.80 frames. ], batch size: 88, lr: 3.19e-02, grad_scale: 2048.0 2024-08-09 18:08:14,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=129500.0, ans=0.125 2024-08-09 18:08:20,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=129500.0, ans=0.1 2024-08-09 18:08:20,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=129500.0, ans=0.125 2024-08-09 18:08:20,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129500.0, ans=0.1 2024-08-09 18:08:24,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.78 vs. limit=22.5 2024-08-09 18:08:27,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=129600.0, ans=0.0 2024-08-09 18:08:49,181 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 18:08:50,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=129800.0, ans=0.125 2024-08-09 18:09:09,571 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2024-08-09 18:09:24,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+01 3.005e+01 3.464e+01 3.958e+01 5.866e+01, threshold=6.929e+01, percent-clipped=0.0 2024-08-09 18:09:24,401 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13000, loss[loss=0.141, beats_loss=0.01362, ecapa_loss=0.0003447, whisper_loss=0.124, over 21875.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01317, ecapa_loss=0.000379, whisper_loss=0.103, over 3890026.18 frames. ], batch size: 86, lr: 3.18e-02, grad_scale: 2048.0 2024-08-09 18:09:25,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=130000.0, ans=0.05 2024-08-09 18:09:26,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=130000.0, ans=0.125 2024-08-09 18:09:30,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=130000.0, ans=0.0 2024-08-09 18:09:42,529 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2024-08-09 18:10:08,818 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 18:10:13,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=130300.0, ans=0.1 2024-08-09 18:10:27,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=130400.0, ans=0.125 2024-08-09 18:10:31,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=130400.0, ans=0.2 2024-08-09 18:10:38,072 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13050, loss[loss=0.1338, beats_loss=0.01189, ecapa_loss=0.0004268, whisper_loss=0.1177, over 19844.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01316, ecapa_loss=0.0003792, whisper_loss=0.1031, over 3869558.82 frames. ], batch size: 81, lr: 3.18e-02, grad_scale: 2048.0 2024-08-09 18:10:45,101 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-09 18:11:02,291 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 18:11:36,467 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.459e-01 2024-08-09 18:11:44,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=130800.0, ans=0.125 2024-08-09 18:11:53,523 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-09 18:11:55,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=130900.0, ans=0.0 2024-08-09 18:12:08,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.885e+01 3.590e+01 4.189e+01 8.103e+01, threshold=7.179e+01, percent-clipped=1.0 2024-08-09 18:12:08,460 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13100, loss[loss=0.1206, beats_loss=0.01354, ecapa_loss=0.0003276, whisper_loss=0.1038, over 18860.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01325, ecapa_loss=0.0003765, whisper_loss=0.1028, over 3881095.49 frames. ], batch size: 72, lr: 3.17e-02, grad_scale: 2048.0 2024-08-09 18:12:13,544 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.51 vs. limit=22.5 2024-08-09 18:12:19,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=131000.0, ans=0.125 2024-08-09 18:12:27,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=131100.0, ans=0.0 2024-08-09 18:12:48,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=131200.0, ans=0.0 2024-08-09 18:13:00,742 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 18:13:35,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=131400.0, ans=0.125 2024-08-09 18:13:41,053 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13150, loss[loss=0.1248, beats_loss=0.0134, ecapa_loss=0.0004481, whisper_loss=0.1069, over 20753.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01321, ecapa_loss=0.0003758, whisper_loss=0.1029, over 3870228.77 frames. ], batch size: 88, lr: 3.17e-02, grad_scale: 2048.0 2024-08-09 18:13:56,077 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 38 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 18:14:02,969 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=6.364e-02 2024-08-09 18:14:12,418 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 18:14:36,246 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-09 18:14:39,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=131700.0, ans=0.025 2024-08-09 18:14:46,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=131800.0, ans=0.0 2024-08-09 18:14:53,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131800.0, ans=0.1 2024-08-09 18:15:02,331 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 18:15:10,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=131900.0, ans=0.125 2024-08-09 18:15:12,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2024-08-09 18:15:28,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.36 vs. limit=15.0 2024-08-09 18:15:30,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=132000.0, ans=0.0 2024-08-09 18:15:31,467 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.963e+01 3.357e+01 4.080e+01 6.559e+01, threshold=6.714e+01, percent-clipped=0.0 2024-08-09 18:15:31,487 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13200, loss[loss=0.1204, beats_loss=0.01181, ecapa_loss=0.0003469, whisper_loss=0.1051, over 19189.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01317, ecapa_loss=0.0003739, whisper_loss=0.1035, over 3881569.92 frames. ], batch size: 73, lr: 3.16e-02, grad_scale: 2048.0 2024-08-09 18:15:39,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=132000.0, ans=0.2 2024-08-09 18:16:04,957 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 18:16:07,740 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.86 vs. limit=22.5 2024-08-09 18:16:15,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=132200.0, ans=0.0 2024-08-09 18:16:29,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=132200.0, ans=0.1 2024-08-09 18:16:34,048 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-09 18:16:40,159 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-09 18:16:42,295 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 18:17:03,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=132400.0, ans=0.125 2024-08-09 18:17:11,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=132400.0, ans=6.0 2024-08-09 18:17:16,244 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13250, loss[loss=0.1189, beats_loss=0.01279, ecapa_loss=0.0004368, whisper_loss=0.1018, over 17254.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01315, ecapa_loss=0.0003725, whisper_loss=0.1029, over 3870686.24 frames. ], batch size: 71, lr: 3.16e-02, grad_scale: 2048.0 2024-08-09 18:18:02,227 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-09 18:18:06,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=132700.0, ans=0.125 2024-08-09 18:18:15,595 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-09 18:18:40,082 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+01 2.970e+01 3.375e+01 4.348e+01 9.574e+01, threshold=6.749e+01, percent-clipped=3.0 2024-08-09 18:18:40,105 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13300, loss[loss=0.1164, beats_loss=0.0151, ecapa_loss=0.0003893, whisper_loss=0.09743, over 21379.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01308, ecapa_loss=0.0003739, whisper_loss=0.1032, over 3866997.29 frames. ], batch size: 89, lr: 3.15e-02, grad_scale: 2048.0 2024-08-09 18:18:43,302 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 8 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 18:18:49,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=133000.0, ans=0.1 2024-08-09 18:19:12,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=133200.0, ans=0.1 2024-08-09 18:19:16,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=133200.0, ans=0.1 2024-08-09 18:19:22,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=133300.0, ans=0.125 2024-08-09 18:19:49,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=133500.0, ans=0.125 2024-08-09 18:19:50,138 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13350, loss[loss=0.1311, beats_loss=0.01234, ecapa_loss=0.000423, whisper_loss=0.1145, over 22391.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01318, ecapa_loss=0.0003744, whisper_loss=0.103, over 3896110.10 frames. ], batch size: 91, lr: 3.15e-02, grad_scale: 2048.0 2024-08-09 18:19:56,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=133500.0, ans=0.2 2024-08-09 18:19:56,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=133500.0, ans=0.0 2024-08-09 18:20:05,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=133600.0, ans=0.0 2024-08-09 18:20:09,553 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-09 18:20:11,284 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 18:20:16,947 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 18:20:20,561 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.46 vs. limit=22.5 2024-08-09 18:20:25,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=133700.0, ans=0.125 2024-08-09 18:20:27,948 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 18:20:28,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=133700.0, ans=0.125 2024-08-09 18:20:31,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=133700.0, ans=0.0 2024-08-09 18:20:32,725 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-09 18:20:37,092 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-09 18:20:41,605 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-08-09 18:20:45,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=133800.0, ans=0.0 2024-08-09 18:20:48,536 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 18:20:55,583 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 14 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-09 18:21:01,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=133900.0, ans=0.125 2024-08-09 18:21:03,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 3.029e+01 3.343e+01 3.897e+01 6.977e+01, threshold=6.687e+01, percent-clipped=1.0 2024-08-09 18:21:03,460 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13400, loss[loss=0.1034, beats_loss=0.01326, ecapa_loss=0.0003949, whisper_loss=0.08619, over 20072.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01317, ecapa_loss=0.0003759, whisper_loss=0.1028, over 3888656.83 frames. ], batch size: 83, lr: 3.14e-02, grad_scale: 2048.0 2024-08-09 18:21:06,616 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-09 18:21:11,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=134000.0, ans=0.125 2024-08-09 18:21:13,566 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-09 18:21:17,906 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 18:21:29,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=134100.0, ans=0.125 2024-08-09 18:21:34,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=134200.0, ans=0.125 2024-08-09 18:21:41,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=134200.0, ans=0.125 2024-08-09 18:21:42,392 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=12.0 2024-08-09 18:21:49,943 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-09 18:21:51,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=134300.0, ans=0.125 2024-08-09 18:21:53,897 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 18:22:07,243 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.75 vs. limit=15.0 2024-08-09 18:22:13,335 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13450, loss[loss=0.1117, beats_loss=0.01431, ecapa_loss=0.0003412, whisper_loss=0.09401, over 15321.00 frames. ], tot_loss[loss=0.1195, beats_loss=0.01312, ecapa_loss=0.0003755, whisper_loss=0.1026, over 3884452.86 frames. ], batch size: 61, lr: 3.14e-02, grad_scale: 2048.0 2024-08-09 18:22:14,819 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-09 18:22:18,871 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 18:22:21,753 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 18:22:23,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=134500.0, ans=0.0 2024-08-09 18:22:27,987 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 26 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-09 18:22:31,297 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2024-08-09 18:22:46,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=134700.0, ans=0.0 2024-08-09 18:23:00,187 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=12.0 2024-08-09 18:23:18,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=134900.0, ans=0.2 2024-08-09 18:23:23,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.856e+01 3.489e+01 4.024e+01 6.380e+01, threshold=6.978e+01, percent-clipped=0.0 2024-08-09 18:23:23,143 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13500, loss[loss=0.133, beats_loss=0.01263, ecapa_loss=0.0003878, whisper_loss=0.1165, over 20320.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01315, ecapa_loss=0.0003754, whisper_loss=0.1031, over 3870842.70 frames. ], batch size: 78, lr: 3.14e-02, grad_scale: 2048.0 2024-08-09 18:23:25,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=135000.0, ans=10.0 2024-08-09 18:23:43,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=135100.0, ans=0.95 2024-08-09 18:23:45,653 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 18:23:56,344 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.72 vs. limit=22.5 2024-08-09 18:24:04,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=135300.0, ans=0.125 2024-08-09 18:24:11,647 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-09 18:24:19,100 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 18:24:21,765 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 18:24:26,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=135400.0, ans=0.2 2024-08-09 18:24:34,596 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13550, loss[loss=0.1392, beats_loss=0.01206, ecapa_loss=0.0003495, whisper_loss=0.1236, over 20182.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01312, ecapa_loss=0.0003742, whisper_loss=0.1037, over 3886629.58 frames. ], batch size: 79, lr: 3.13e-02, grad_scale: 2048.0 2024-08-09 18:24:45,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=135500.0, ans=0.2 2024-08-09 18:24:46,219 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-09 18:24:54,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=135600.0, ans=0.0 2024-08-09 18:25:10,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=135700.0, ans=0.0 2024-08-09 18:25:17,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=135800.0, ans=0.04949747468305833 2024-08-09 18:25:23,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=135800.0, ans=0.125 2024-08-09 18:25:34,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=135900.0, ans=0.125 2024-08-09 18:25:40,014 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 18:25:47,056 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 3.070e+01 3.576e+01 4.104e+01 5.875e+01, threshold=7.153e+01, percent-clipped=0.0 2024-08-09 18:25:47,077 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13600, loss[loss=0.1159, beats_loss=0.01581, ecapa_loss=0.0003034, whisper_loss=0.09701, over 19430.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01302, ecapa_loss=0.0003729, whisper_loss=0.1036, over 3865256.71 frames. ], batch size: 76, lr: 3.13e-02, grad_scale: 2048.0 2024-08-09 18:26:04,643 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-09 18:26:10,680 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-09 18:26:10,982 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.594e-03 2024-08-09 18:26:23,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=136200.0, ans=0.125 2024-08-09 18:26:24,267 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2024-08-09 18:26:39,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136300.0, ans=0.125 2024-08-09 18:26:40,927 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.55 vs. limit=15.0 2024-08-09 18:26:43,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136400.0, ans=0.1 2024-08-09 18:26:46,469 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 42 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 18:26:58,635 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13650, loss[loss=0.1152, beats_loss=0.01494, ecapa_loss=0.0002977, whisper_loss=0.09728, over 17154.00 frames. ], tot_loss[loss=0.12, beats_loss=0.0131, ecapa_loss=0.0003736, whisper_loss=0.1031, over 3869009.08 frames. ], batch size: 64, lr: 3.12e-02, grad_scale: 2048.0 2024-08-09 18:26:58,860 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-09 18:27:16,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=136600.0, ans=0.0 2024-08-09 18:27:27,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=136700.0, ans=0.0 2024-08-09 18:27:33,533 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.01 vs. limit=22.5 2024-08-09 18:27:51,814 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 18:27:56,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=136900.0, ans=0.2 2024-08-09 18:27:57,023 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 18:28:09,223 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=15.0 2024-08-09 18:28:09,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.873e+01 3.233e+01 3.836e+01 5.786e+01, threshold=6.466e+01, percent-clipped=0.0 2024-08-09 18:28:09,778 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13700, loss[loss=0.09229, beats_loss=0.01344, ecapa_loss=0.0003604, whisper_loss=0.07525, over 16743.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01308, ecapa_loss=0.0003755, whisper_loss=0.1031, over 3846668.13 frames. ], batch size: 65, lr: 3.12e-02, grad_scale: 2048.0 2024-08-09 18:28:23,117 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-09 18:28:23,774 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.98 vs. limit=10.0 2024-08-09 18:28:26,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=137100.0, ans=0.0 2024-08-09 18:28:33,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=137100.0, ans=0.125 2024-08-09 18:28:40,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=137200.0, ans=0.125 2024-08-09 18:28:48,406 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-09 18:28:51,434 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 18:29:11,348 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.34 vs. limit=15.0 2024-08-09 18:29:20,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13750, loss[loss=0.1334, beats_loss=0.01292, ecapa_loss=0.0002532, whisper_loss=0.1179, over 22860.00 frames. ], tot_loss[loss=0.1194, beats_loss=0.01309, ecapa_loss=0.0003719, whisper_loss=0.1026, over 3832081.19 frames. ], batch size: 83, lr: 3.11e-02, grad_scale: 2048.0 2024-08-09 18:29:20,524 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-09 18:29:23,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=137500.0, ans=0.125 2024-08-09 18:29:29,819 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 18:29:33,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=137600.0, ans=0.07 2024-08-09 18:29:36,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=137600.0, ans=0.1 2024-08-09 18:29:43,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=137600.0, ans=0.1 2024-08-09 18:29:50,936 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 18:29:56,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=12.0 2024-08-09 18:29:57,852 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 35 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-09 18:30:06,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=137800.0, ans=0.0 2024-08-09 18:30:11,492 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 18:30:28,783 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+01 2.986e+01 3.490e+01 4.118e+01 8.159e+01, threshold=6.980e+01, percent-clipped=6.0 2024-08-09 18:30:28,803 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13800, loss[loss=0.094, beats_loss=0.01661, ecapa_loss=0.0003511, whisper_loss=0.07388, over 15244.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01309, ecapa_loss=0.0003739, whisper_loss=0.103, over 3846678.64 frames. ], batch size: 62, lr: 3.11e-02, grad_scale: 2048.0 2024-08-09 18:30:47,092 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-09 18:30:55,539 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-09 18:31:02,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=138200.0, ans=0.125 2024-08-09 18:31:10,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=138300.0, ans=0.0 2024-08-09 18:31:31,942 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 18:31:33,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=138400.0, ans=0.2 2024-08-09 18:31:37,266 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13850, loss[loss=0.1131, beats_loss=0.01514, ecapa_loss=0.000424, whisper_loss=0.09371, over 16726.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01304, ecapa_loss=0.0003753, whisper_loss=0.1033, over 3869933.00 frames. ], batch size: 69, lr: 3.11e-02, grad_scale: 2048.0 2024-08-09 18:32:15,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=138700.0, ans=0.125 2024-08-09 18:32:32,801 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2024-08-09 18:32:34,143 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2024-08-09 18:32:35,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=138900.0, ans=0.125 2024-08-09 18:32:38,306 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 18:32:44,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=138900.0, ans=0.125 2024-08-09 18:32:47,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=138900.0, ans=0.0 2024-08-09 18:32:49,939 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 2.816e+01 3.337e+01 3.813e+01 6.629e+01, threshold=6.673e+01, percent-clipped=0.0 2024-08-09 18:32:49,968 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13900, loss[loss=0.1303, beats_loss=0.01113, ecapa_loss=0.0004045, whisper_loss=0.1151, over 21962.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01296, ecapa_loss=0.0003752, whisper_loss=0.1034, over 3882900.48 frames. ], batch size: 92, lr: 3.10e-02, grad_scale: 2048.0 2024-08-09 18:32:56,405 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-08-09 18:32:57,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=139000.0, ans=0.125 2024-08-09 18:33:01,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=139000.0, ans=0.125 2024-08-09 18:33:07,381 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-08-09 18:33:17,023 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2024-08-09 18:33:50,699 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 18:34:00,060 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 13950, loss[loss=0.07794, beats_loss=0.01192, ecapa_loss=0.0003533, whisper_loss=0.06249, over 14184.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01296, ecapa_loss=0.0003755, whisper_loss=0.1039, over 3857753.75 frames. ], batch size: 55, lr: 3.10e-02, grad_scale: 2048.0 2024-08-09 18:34:25,896 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-09 18:34:35,416 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 18:34:55,473 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-09 18:35:05,051 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 18:35:09,074 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 3.055e+01 3.459e+01 4.049e+01 5.260e+01, threshold=6.917e+01, percent-clipped=0.0 2024-08-09 18:35:09,096 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 14000, loss[loss=0.1322, beats_loss=0.0117, ecapa_loss=0.0003663, whisper_loss=0.1169, over 22018.00 frames. ], tot_loss[loss=0.1208, beats_loss=0.0129, ecapa_loss=0.0003711, whisper_loss=0.1041, over 3863956.83 frames. ], batch size: 89, lr: 3.09e-02, grad_scale: 4096.0 2024-08-09 18:35:25,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=140100.0, ans=0.02 2024-08-09 18:35:26,380 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 27 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 18:35:30,357 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 18:35:33,522 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-09 18:35:43,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=140200.0, ans=0.0 2024-08-09 18:35:49,748 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 21 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 18:35:51,054 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 24 from LS+wenet, 7 from Vox, 25 fro AS 2024-08-09 18:35:59,982 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 18:36:11,488 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2024-08-09 18:36:18,036 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 14050, loss[loss=0.1102, beats_loss=0.01652, ecapa_loss=0.0003124, whisper_loss=0.0906, over 19099.00 frames. ], tot_loss[loss=0.1209, beats_loss=0.01297, ecapa_loss=0.0003703, whisper_loss=0.1043, over 3848904.16 frames. ], batch size: 73, lr: 3.09e-02, grad_scale: 4096.0 2024-08-09 18:36:21,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=140500.0, ans=0.0 2024-08-09 18:36:42,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=140600.0, ans=0.1 2024-08-09 18:36:45,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=140700.0, ans=0.125 2024-08-09 18:36:48,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=140700.0, ans=0.2 2024-08-09 18:36:49,425 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 18:36:57,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=140700.0, ans=0.0 2024-08-09 18:37:07,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=140800.0, ans=0.125 2024-08-09 18:37:08,622 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-09 18:37:10,055 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-09 18:37:21,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=140900.0, ans=0.0 2024-08-09 18:37:22,405 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 18:37:27,624 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 3.062e+01 3.430e+01 4.130e+01 6.899e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-09 18:37:27,644 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 14100, loss[loss=0.1034, beats_loss=0.01491, ecapa_loss=0.0003397, whisper_loss=0.0851, over 14602.00 frames. ], tot_loss[loss=0.1202, beats_loss=0.01299, ecapa_loss=0.0003675, whisper_loss=0.1036, over 3804642.11 frames. ], batch size: 56, lr: 3.08e-02, grad_scale: 4096.0 2024-08-09 18:37:32,697 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2024-08-09 18:37:36,780 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=12.0 2024-08-09 18:37:39,061 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 18:37:55,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=141200.0, ans=0.035 2024-08-09 18:37:55,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=141200.0, ans=0.125 2024-08-09 18:38:02,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=141200.0, ans=0.125 2024-08-09 18:38:05,380 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 18:38:10,485 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.30 vs. limit=6.0 2024-08-09 18:38:12,884 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 18:38:15,255 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 28 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-09 18:38:17,381 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=15.0 2024-08-09 18:38:36,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141500.0, ans=0.1 2024-08-09 18:38:37,402 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 14150, loss[loss=0.09344, beats_loss=0.01197, ecapa_loss=0.0003077, whisper_loss=0.07839, over 14608.00 frames. ], tot_loss[loss=0.1197, beats_loss=0.01306, ecapa_loss=0.0003667, whisper_loss=0.103, over 3809318.29 frames. ], batch size: 56, lr: 3.08e-02, grad_scale: 4096.0 2024-08-09 18:38:39,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141500.0, ans=0.1 2024-08-09 18:38:44,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=141500.0, ans=0.04949747468305833 2024-08-09 18:38:50,234 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-09 18:39:43,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=141900.0, ans=0.125 2024-08-09 18:39:44,681 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-09 18:39:48,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 3.107e+01 3.530e+01 4.182e+01 6.705e+01, threshold=7.061e+01, percent-clipped=0.0 2024-08-09 18:39:48,636 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 14200, loss[loss=0.07932, beats_loss=0.0171, ecapa_loss=0.0004465, whisper_loss=0.05775, over 19642.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01303, ecapa_loss=0.0003676, whisper_loss=0.1024, over 3807118.41 frames. ], batch size: 84, lr: 3.08e-02, grad_scale: 4096.0 2024-08-09 18:40:09,566 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 18:40:11,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=142100.0, ans=0.5 2024-08-09 18:40:24,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=142200.0, ans=0.0 2024-08-09 18:40:44,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=142300.0, ans=0.0 2024-08-09 18:41:04,235 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 14250, loss[loss=0.123, beats_loss=0.01127, ecapa_loss=0.0004291, whisper_loss=0.1074, over 13819.00 frames. ], tot_loss[loss=0.1195, beats_loss=0.01306, ecapa_loss=0.0003663, whisper_loss=0.1028, over 3843890.61 frames. ], batch size: 55, lr: 3.07e-02, grad_scale: 4096.0 2024-08-09 18:41:06,322 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-09 18:41:39,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=142700.0, ans=0.0 2024-08-09 18:41:51,258 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.96 vs. limit=22.5 2024-08-09 18:42:19,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.991e+01 3.300e+01 4.002e+01 6.725e+01, threshold=6.600e+01, percent-clipped=0.0 2024-08-09 18:42:19,842 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 14300, loss[loss=0.1134, beats_loss=0.01394, ecapa_loss=0.000424, whisper_loss=0.09525, over 17207.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01301, ecapa_loss=0.0003669, whisper_loss=0.1034, over 3874384.50 frames. ], batch size: 76, lr: 3.07e-02, grad_scale: 4096.0 2024-08-09 18:42:30,528 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 38 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-09 18:42:56,076 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 18:43:01,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=143200.0, ans=0.0 2024-08-09 18:43:03,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=143300.0, ans=10.0 2024-08-09 18:43:04,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=143300.0, ans=0.125 2024-08-09 18:43:05,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=143300.0, ans=10.0 2024-08-09 18:43:09,695 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2024-08-09 18:43:33,131 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 14350, loss[loss=0.1075, beats_loss=0.01521, ecapa_loss=0.000329, whisper_loss=0.08902, over 20662.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.013, ecapa_loss=0.0003656, whisper_loss=0.1038, over 3886960.73 frames. ], batch size: 83, lr: 3.06e-02, grad_scale: 4096.0 2024-08-09 18:43:40,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=143500.0, ans=0.125 2024-08-09 18:43:46,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=143600.0, ans=0.125 2024-08-09 18:43:49,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=143600.0, ans=0.125 2024-08-09 18:44:25,729 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-09 18:44:27,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=143800.0, ans=0.125 2024-08-09 18:44:30,624 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.761e+00 2024-08-09 18:44:43,325 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 18:44:48,775 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.978e+01 3.379e+01 3.872e+01 1.013e+02, threshold=6.758e+01, percent-clipped=3.0 2024-08-09 18:44:48,800 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 14400, loss[loss=0.1084, beats_loss=0.01283, ecapa_loss=0.0004131, whisper_loss=0.09141, over 13905.00 frames. ], tot_loss[loss=0.121, beats_loss=0.0129, ecapa_loss=0.0003681, whisper_loss=0.1044, over 3887574.52 frames. ], batch size: 55, lr: 3.06e-02, grad_scale: 4096.0 2024-08-09 18:44:56,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=144000.0, ans=0.0 2024-08-09 18:44:59,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=144000.0, ans=0.05 2024-08-09 18:45:21,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=144200.0, ans=0.125 2024-08-09 18:45:37,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=144300.0, ans=0.1 2024-08-09 18:45:37,567 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2024-08-09 18:45:42,375 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 18:45:44,376 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.80 vs. limit=6.0 2024-08-09 18:45:47,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=144400.0, ans=0.0 2024-08-09 18:45:48,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=144400.0, ans=0.2 2024-08-09 18:45:51,592 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-09 18:45:51,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=144400.0, ans=0.125 2024-08-09 18:46:01,637 INFO [train_multi_KD3.py:1116] (1/4) Epoch 1, batch 14450, loss[loss=0.1277, beats_loss=0.0123, ecapa_loss=0.0004031, whisper_loss=0.1113, over 17913.00 frames. ], tot_loss[loss=0.1205, beats_loss=0.01298, ecapa_loss=0.0003683, whisper_loss=0.1038, over 3878154.85 frames. ], batch size: 71, lr: 3.05e-02, grad_scale: 4096.0 2024-08-09 18:46:02,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=144500.0, ans=0.0 2024-08-09 18:46:03,249 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 18:46:10,274 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 18:46:19,115 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 18:46:22,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=144600.0, ans=0.125 2024-08-09 18:46:31,748 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 18:46:51,628 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 18:47:02,062 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=15.0 2024-08-09 18:47:03,196 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2024-08-09 18:47:50,370 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 18:47:51,903 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 0, loss[loss=0.1081, beats_loss=0.01563, ecapa_loss=0.0003818, whisper_loss=0.08869, over 18314.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01563, ecapa_loss=0.0003818, whisper_loss=0.08869, over 18314.00 frames. ], batch size: 72, lr: 2.99e-02, grad_scale: 4096.0 2024-08-09 18:47:51,903 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-09 18:48:33,900 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on ASR_libri: loss=0.287, beats_loss=0, ecapa_loss=0.001066, whisper_loss=0.2763, over 922467.00 frames. 2024-08-09 18:48:50,209 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on SV_voxceleb1: loss=0.009611, beats_loss=0, ecapa_loss=0.0009611, whisper_loss=0, over 939242.00 frames. 2024-08-09 18:50:53,471 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on AT_audioset: loss=0.0306, beats_loss=0.0306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 18:50:53,474 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-09 18:50:56,042 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.997e+01 3.426e+01 4.261e+01 6.161e+01, threshold=6.853e+01, percent-clipped=0.0 2024-08-09 18:51:02,185 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2024-08-09 18:51:14,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=144980.0, ans=0.1 2024-08-09 18:51:23,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=145080.0, ans=0.125 2024-08-09 18:51:31,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=145080.0, ans=0.125 2024-08-09 18:51:37,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=145080.0, ans=12.0 2024-08-09 18:51:50,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=145180.0, ans=0.0 2024-08-09 18:52:38,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=145380.0, ans=0.0 2024-08-09 18:53:03,251 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 50, loss[loss=0.1164, beats_loss=0.01359, ecapa_loss=0.0003747, whisper_loss=0.09912, over 17797.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01334, ecapa_loss=0.0003745, whisper_loss=0.1019, over 885739.23 frames. ], batch size: 70, lr: 2.99e-02, grad_scale: 4096.0 2024-08-09 18:53:44,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=145580.0, ans=0.0 2024-08-09 18:53:46,171 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 18:53:46,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=145580.0, ans=0.0 2024-08-09 18:53:51,159 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-09 18:53:54,246 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=12.0 2024-08-09 18:54:25,992 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 18:54:40,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=145880.0, ans=0.0 2024-08-09 18:54:52,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=145880.0, ans=0.125 2024-08-09 18:55:03,637 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 100, loss[loss=0.112, beats_loss=0.01326, ecapa_loss=0.0003459, whisper_loss=0.09524, over 21175.00 frames. ], tot_loss[loss=0.1195, beats_loss=0.01342, ecapa_loss=0.0003641, whisper_loss=0.1024, over 1575353.46 frames. ], batch size: 81, lr: 2.98e-02, grad_scale: 4096.0 2024-08-09 18:55:07,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 3.227e+01 3.507e+01 4.114e+01 7.130e+01, threshold=7.014e+01, percent-clipped=1.0 2024-08-09 18:56:07,159 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 18:56:35,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-09 18:56:36,508 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 18:56:53,487 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 150, loss[loss=0.1351, beats_loss=0.009815, ecapa_loss=0.0004275, whisper_loss=0.121, over 18950.00 frames. ], tot_loss[loss=0.12, beats_loss=0.01325, ecapa_loss=0.0003628, whisper_loss=0.1031, over 2071109.62 frames. ], batch size: 76, lr: 2.98e-02, grad_scale: 4096.0 2024-08-09 18:56:54,352 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2024-08-09 18:57:10,071 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 14 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 18:57:12,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=146580.0, ans=0.125 2024-08-09 18:57:24,552 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 18:57:32,794 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 18:57:43,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146680.0, ans=0.1 2024-08-09 18:57:59,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=146780.0, ans=0.0 2024-08-09 18:58:04,751 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-09 18:58:12,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=146880.0, ans=0.95 2024-08-09 18:58:20,532 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 200, loss[loss=0.08967, beats_loss=0.01352, ecapa_loss=0.0002992, whisper_loss=0.07316, over 16109.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01295, ecapa_loss=0.0003596, whisper_loss=0.1035, over 2432043.50 frames. ], batch size: 63, lr: 2.97e-02, grad_scale: 4096.0 2024-08-09 18:58:23,278 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.970e+01 3.444e+01 4.293e+01 6.916e+01, threshold=6.888e+01, percent-clipped=0.0 2024-08-09 18:58:28,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=146980.0, ans=0.1 2024-08-09 18:58:39,803 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 30 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 18:58:50,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=147080.0, ans=0.125 2024-08-09 18:59:02,018 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.02 vs. limit=10.0 2024-08-09 18:59:05,633 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 18:59:32,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=147380.0, ans=0.125 2024-08-09 18:59:33,664 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-09 18:59:37,963 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 10 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 18:59:39,057 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 250, loss[loss=0.08222, beats_loss=0.01189, ecapa_loss=0.0003143, whisper_loss=0.06719, over 14156.00 frames. ], tot_loss[loss=0.1195, beats_loss=0.01283, ecapa_loss=0.0003538, whisper_loss=0.1031, over 2747893.88 frames. ], batch size: 54, lr: 2.97e-02, grad_scale: 4096.0 2024-08-09 18:59:40,583 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 18:59:43,217 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.98 vs. limit=15.0 2024-08-09 18:59:45,167 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-09 18:59:47,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=147480.0, ans=0.2 2024-08-09 19:00:09,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=147680.0, ans=10.0 2024-08-09 19:00:10,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=147680.0, ans=0.125 2024-08-09 19:00:45,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=147880.0, ans=0.125 2024-08-09 19:00:48,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=147880.0, ans=0.2 2024-08-09 19:00:54,255 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 300, loss[loss=0.1295, beats_loss=0.01101, ecapa_loss=0.0003811, whisper_loss=0.1146, over 22126.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01279, ecapa_loss=0.000352, whisper_loss=0.102, over 2968197.28 frames. ], batch size: 88, lr: 2.97e-02, grad_scale: 4096.0 2024-08-09 19:00:57,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 3.134e+01 3.449e+01 4.098e+01 7.776e+01, threshold=6.897e+01, percent-clipped=1.0 2024-08-09 19:01:07,999 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-09 19:01:43,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=148280.0, ans=0.1 2024-08-09 19:01:51,698 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2024-08-09 19:01:52,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=148380.0, ans=0.2 2024-08-09 19:02:02,572 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 19:02:02,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=148380.0, ans=0.0 2024-08-09 19:02:08,101 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 350, loss[loss=0.1162, beats_loss=0.01124, ecapa_loss=0.0003537, whisper_loss=0.1014, over 19733.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01285, ecapa_loss=0.0003489, whisper_loss=0.1018, over 3146252.76 frames. ], batch size: 81, lr: 2.96e-02, grad_scale: 4096.0 2024-08-09 19:02:14,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=148480.0, ans=0.125 2024-08-09 19:02:22,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=148580.0, ans=0.125 2024-08-09 19:02:24,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=148580.0, ans=0.125 2024-08-09 19:02:32,412 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 19:03:09,592 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-09 19:03:12,439 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 19:03:23,104 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 400, loss[loss=0.109, beats_loss=0.0132, ecapa_loss=0.0003717, whisper_loss=0.09212, over 17659.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01285, ecapa_loss=0.0003465, whisper_loss=0.1017, over 3263613.43 frames. ], batch size: 70, lr: 2.96e-02, grad_scale: 4096.0 2024-08-09 19:03:24,557 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 19:03:25,564 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.813e+01 3.235e+01 3.879e+01 6.977e+01, threshold=6.469e+01, percent-clipped=1.0 2024-08-09 19:03:27,332 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-09 19:03:30,255 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.71 vs. limit=22.5 2024-08-09 19:03:44,118 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 19:03:44,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=149080.0, ans=0.2 2024-08-09 19:04:01,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=149180.0, ans=0.0 2024-08-09 19:04:11,263 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-09 19:04:28,553 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 29 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-09 19:04:28,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=149380.0, ans=0.1 2024-08-09 19:04:38,777 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 450, loss[loss=0.1312, beats_loss=0.009378, ecapa_loss=0.0003498, whisper_loss=0.1183, over 20286.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01288, ecapa_loss=0.000345, whisper_loss=0.1013, over 3414047.74 frames. ], batch size: 76, lr: 2.95e-02, grad_scale: 4096.0 2024-08-09 19:04:50,022 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2024-08-09 19:04:56,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=149580.0, ans=0.0 2024-08-09 19:05:06,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=149580.0, ans=0.125 2024-08-09 19:05:11,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=149680.0, ans=0.0 2024-08-09 19:05:22,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=149780.0, ans=0.125 2024-08-09 19:05:34,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=149780.0, ans=0.125 2024-08-09 19:05:35,222 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.37 vs. limit=15.0 2024-08-09 19:05:37,421 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 25 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-09 19:05:40,482 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-09 19:05:40,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=149880.0, ans=0.1 2024-08-09 19:05:52,134 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.37 vs. limit=6.0 2024-08-09 19:05:54,131 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 500, loss[loss=0.1282, beats_loss=0.01286, ecapa_loss=0.0003029, whisper_loss=0.1123, over 20092.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01268, ecapa_loss=0.0003442, whisper_loss=0.102, over 3522841.12 frames. ], batch size: 74, lr: 2.95e-02, grad_scale: 4096.0 2024-08-09 19:05:54,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=149980.0, ans=0.0 2024-08-09 19:05:57,090 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.962e+01 3.493e+01 4.226e+01 6.986e+01, threshold=6.987e+01, percent-clipped=1.0 2024-08-09 19:06:00,167 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:06:25,369 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2024-08-09 19:06:31,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=150180.0, ans=0.04949747468305833 2024-08-09 19:06:38,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=150280.0, ans=0.2 2024-08-09 19:07:04,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=150380.0, ans=0.125 2024-08-09 19:07:04,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=150380.0, ans=0.125 2024-08-09 19:07:10,303 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 550, loss[loss=0.1276, beats_loss=0.01294, ecapa_loss=0.000287, whisper_loss=0.1118, over 17053.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01261, ecapa_loss=0.0003437, whisper_loss=0.1021, over 3592658.59 frames. ], batch size: 65, lr: 2.95e-02, grad_scale: 4096.0 2024-08-09 19:07:13,512 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-09 19:07:53,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=150680.0, ans=0.125 2024-08-09 19:08:08,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=150780.0, ans=0.125 2024-08-09 19:08:08,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150780.0, ans=0.1 2024-08-09 19:08:18,109 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.28 vs. limit=15.0 2024-08-09 19:08:26,061 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 600, loss[loss=0.1237, beats_loss=0.01271, ecapa_loss=0.0002864, whisper_loss=0.1081, over 15774.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01267, ecapa_loss=0.0003407, whisper_loss=0.1019, over 3627672.08 frames. ], batch size: 61, lr: 2.94e-02, grad_scale: 4096.0 2024-08-09 19:08:28,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.923e+01 3.308e+01 3.857e+01 5.897e+01, threshold=6.616e+01, percent-clipped=0.0 2024-08-09 19:08:39,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=151080.0, ans=0.0 2024-08-09 19:09:01,464 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 19:09:06,782 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.45 vs. limit=15.0 2024-08-09 19:09:12,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=151280.0, ans=0.125 2024-08-09 19:09:26,409 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 28 from LS+wenet, 19 from Vox, 15 fro AS 2024-08-09 19:09:39,876 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2024-08-09 19:09:40,414 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 650, loss[loss=0.09708, beats_loss=0.01405, ecapa_loss=0.0003571, whisper_loss=0.07946, over 16152.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01276, ecapa_loss=0.0003425, whisper_loss=0.1019, over 3701776.53 frames. ], batch size: 64, lr: 2.94e-02, grad_scale: 4096.0 2024-08-09 19:09:53,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=151580.0, ans=0.1 2024-08-09 19:09:56,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=151580.0, ans=0.1 2024-08-09 19:10:09,340 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-09 19:10:47,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=151880.0, ans=0.2 2024-08-09 19:10:49,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=151880.0, ans=0.125 2024-08-09 19:10:49,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=151880.0, ans=0.1 2024-08-09 19:10:55,128 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 700, loss[loss=0.1023, beats_loss=0.01318, ecapa_loss=0.0003621, whisper_loss=0.08554, over 21775.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01284, ecapa_loss=0.0003391, whisper_loss=0.1015, over 3749741.99 frames. ], batch size: 90, lr: 2.94e-02, grad_scale: 4096.0 2024-08-09 19:10:55,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=151980.0, ans=0.125 2024-08-09 19:10:57,922 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.682e+01 3.217e+01 3.765e+01 7.105e+01, threshold=6.434e+01, percent-clipped=1.0 2024-08-09 19:11:00,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=151980.0, ans=0.125 2024-08-09 19:11:07,021 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-09 19:11:13,972 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.81 vs. limit=22.5 2024-08-09 19:11:21,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=152080.0, ans=0.125 2024-08-09 19:11:25,597 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.608e-01 2024-08-09 19:11:27,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=152180.0, ans=0.2 2024-08-09 19:11:55,320 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-09 19:11:59,951 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-09 19:12:10,104 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 750, loss[loss=0.1127, beats_loss=0.01206, ecapa_loss=0.0003243, whisper_loss=0.09744, over 23082.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01273, ecapa_loss=0.0003407, whisper_loss=0.1018, over 3751004.18 frames. ], batch size: 92, lr: 2.93e-02, grad_scale: 4096.0 2024-08-09 19:12:21,103 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 28 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 19:12:25,799 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.40 vs. limit=15.0 2024-08-09 19:12:26,999 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 19:12:38,831 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.52 vs. limit=15.0 2024-08-09 19:12:50,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=152680.0, ans=0.125 2024-08-09 19:13:01,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=152780.0, ans=0.07 2024-08-09 19:13:10,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=152880.0, ans=0.125 2024-08-09 19:13:26,861 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 800, loss[loss=0.1005, beats_loss=0.01593, ecapa_loss=0.0003691, whisper_loss=0.08091, over 21252.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01269, ecapa_loss=0.0003397, whisper_loss=0.102, over 3755888.89 frames. ], batch size: 89, lr: 2.93e-02, grad_scale: 4096.0 2024-08-09 19:13:27,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=152980.0, ans=0.125 2024-08-09 19:13:30,099 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 2.796e+01 3.224e+01 3.871e+01 5.736e+01, threshold=6.448e+01, percent-clipped=0.0 2024-08-09 19:13:35,342 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.19 vs. limit=5.0 2024-08-09 19:13:53,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=153080.0, ans=0.0 2024-08-09 19:14:24,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=153280.0, ans=0.125 2024-08-09 19:14:35,689 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 19:14:43,356 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 850, loss[loss=0.1098, beats_loss=0.01258, ecapa_loss=0.0003746, whisper_loss=0.09352, over 18323.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01269, ecapa_loss=0.0003384, whisper_loss=0.1009, over 3788062.90 frames. ], batch size: 74, lr: 2.92e-02, grad_scale: 4096.0 2024-08-09 19:14:49,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=153480.0, ans=0.07 2024-08-09 19:14:50,609 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=15.0 2024-08-09 19:14:53,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=153480.0, ans=0.125 2024-08-09 19:15:02,379 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 19:15:10,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=153580.0, ans=0.125 2024-08-09 19:15:13,455 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 30 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 19:15:13,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=153680.0, ans=0.0 2024-08-09 19:15:25,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=153680.0, ans=0.1 2024-08-09 19:15:42,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=153780.0, ans=0.0 2024-08-09 19:15:43,488 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-09 19:15:45,467 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=7.690e-02 2024-08-09 19:16:02,479 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 900, loss[loss=0.1116, beats_loss=0.01677, ecapa_loss=0.0003145, whisper_loss=0.09168, over 22366.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01278, ecapa_loss=0.0003384, whisper_loss=0.1003, over 3772378.30 frames. ], batch size: 93, lr: 2.92e-02, grad_scale: 4096.0 2024-08-09 19:16:05,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.273e+01 2.893e+01 3.249e+01 3.934e+01 7.637e+01, threshold=6.497e+01, percent-clipped=1.0 2024-08-09 19:16:08,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=153980.0, ans=0.125 2024-08-09 19:16:09,276 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 19:16:15,135 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 19:16:27,929 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2024-08-09 19:16:41,349 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 19:16:43,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=154180.0, ans=0.035 2024-08-09 19:16:49,622 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2024-08-09 19:16:50,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=154280.0, ans=0.125 2024-08-09 19:16:56,767 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2024-08-09 19:16:59,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=154280.0, ans=0.125 2024-08-09 19:17:00,159 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 19:17:15,525 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=12.0 2024-08-09 19:17:19,531 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 950, loss[loss=0.1065, beats_loss=0.01205, ecapa_loss=0.0003051, whisper_loss=0.09139, over 16897.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01282, ecapa_loss=0.0003387, whisper_loss=0.1002, over 3792524.27 frames. ], batch size: 65, lr: 2.92e-02, grad_scale: 4096.0 2024-08-09 19:17:24,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=154480.0, ans=0.0 2024-08-09 19:17:44,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=154580.0, ans=0.125 2024-08-09 19:17:50,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=154680.0, ans=0.125 2024-08-09 19:17:57,964 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-09 19:18:14,885 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.24 vs. limit=22.5 2024-08-09 19:18:17,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=154780.0, ans=0.1 2024-08-09 19:18:17,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=154780.0, ans=0.125 2024-08-09 19:18:19,518 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-09 19:18:26,350 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.44 vs. limit=15.0 2024-08-09 19:18:30,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=154880.0, ans=0.2 2024-08-09 19:18:33,831 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2024-08-09 19:18:37,865 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1000, loss[loss=0.1407, beats_loss=0.01128, ecapa_loss=0.0003079, whisper_loss=0.1263, over 20989.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01286, ecapa_loss=0.0003367, whisper_loss=0.1003, over 3786007.17 frames. ], batch size: 77, lr: 2.91e-02, grad_scale: 4096.0 2024-08-09 19:18:41,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.201e+01 2.941e+01 3.307e+01 3.877e+01 7.420e+01, threshold=6.613e+01, percent-clipped=2.0 2024-08-09 19:18:41,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=154980.0, ans=0.125 2024-08-09 19:18:43,424 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.06 vs. limit=15.0 2024-08-09 19:18:44,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=154980.0, ans=0.125 2024-08-09 19:18:47,736 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 26 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-09 19:18:53,803 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-09 19:19:07,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=155080.0, ans=0.0 2024-08-09 19:19:12,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=155180.0, ans=0.0 2024-08-09 19:19:15,846 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.32 vs. limit=22.5 2024-08-09 19:19:17,212 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2024-08-09 19:19:24,813 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 19:19:32,697 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2024-08-09 19:19:37,242 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-09 19:19:46,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=155380.0, ans=0.5 2024-08-09 19:19:50,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=155380.0, ans=0.125 2024-08-09 19:19:59,558 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1050, loss[loss=0.1112, beats_loss=0.01093, ecapa_loss=0.0003954, whisper_loss=0.09628, over 14852.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01284, ecapa_loss=0.0003361, whisper_loss=0.1008, over 3823144.29 frames. ], batch size: 57, lr: 2.91e-02, grad_scale: 4096.0 2024-08-09 19:20:19,360 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-09 19:20:19,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=155580.0, ans=0.1 2024-08-09 19:20:43,583 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-09 19:20:50,068 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 30 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 19:21:01,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=155880.0, ans=15.0 2024-08-09 19:21:03,189 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 19:21:04,769 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-09 19:21:13,806 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1100, loss[loss=0.1353, beats_loss=0.009925, ecapa_loss=0.0003763, whisper_loss=0.1216, over 19269.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01281, ecapa_loss=0.0003346, whisper_loss=0.1011, over 3832082.05 frames. ], batch size: 74, lr: 2.90e-02, grad_scale: 4096.0 2024-08-09 19:21:17,118 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.935e+01 3.266e+01 4.117e+01 7.646e+01, threshold=6.532e+01, percent-clipped=3.0 2024-08-09 19:21:42,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=156180.0, ans=0.0 2024-08-09 19:21:44,419 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 19:22:03,607 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=15.0 2024-08-09 19:22:18,235 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.65 vs. limit=22.5 2024-08-09 19:22:24,118 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1150, loss[loss=0.1093, beats_loss=0.01454, ecapa_loss=0.000299, whisper_loss=0.09172, over 22462.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01275, ecapa_loss=0.0003336, whisper_loss=0.1015, over 3827742.06 frames. ], batch size: 88, lr: 2.90e-02, grad_scale: 4096.0 2024-08-09 19:22:30,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=156480.0, ans=0.0 2024-08-09 19:22:50,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=156680.0, ans=0.125 2024-08-09 19:23:07,121 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 19:23:17,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=156880.0, ans=0.0 2024-08-09 19:23:22,030 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 19:23:29,007 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.46 vs. limit=15.0 2024-08-09 19:23:30,634 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1200, loss[loss=0.09119, beats_loss=0.01451, ecapa_loss=0.0003136, whisper_loss=0.07354, over 15766.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01276, ecapa_loss=0.0003328, whisper_loss=0.1006, over 3805687.23 frames. ], batch size: 64, lr: 2.90e-02, grad_scale: 4096.0 2024-08-09 19:23:33,115 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.304e+01 2.894e+01 3.270e+01 3.890e+01 7.018e+01, threshold=6.539e+01, percent-clipped=1.0 2024-08-09 19:23:40,396 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.08 vs. limit=22.5 2024-08-09 19:23:50,987 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 16 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-09 19:23:52,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=157080.0, ans=0.125 2024-08-09 19:24:01,597 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-09 19:24:19,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=157280.0, ans=0.125 2024-08-09 19:24:34,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=157380.0, ans=0.0 2024-08-09 19:24:36,070 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1250, loss[loss=0.1277, beats_loss=0.01125, ecapa_loss=0.0003859, whisper_loss=0.1126, over 21678.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.0128, ecapa_loss=0.0003334, whisper_loss=0.1002, over 3813125.43 frames. ], batch size: 84, lr: 2.89e-02, grad_scale: 4096.0 2024-08-09 19:24:40,070 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 14 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 19:24:43,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=157480.0, ans=0.1 2024-08-09 19:24:43,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=157480.0, ans=0.125 2024-08-09 19:24:44,623 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2024-08-09 19:24:46,561 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 19:24:53,653 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 19:24:55,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=157580.0, ans=0.125 2024-08-09 19:25:01,293 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 19:25:03,704 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 19:25:22,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=157780.0, ans=0.0 2024-08-09 19:25:41,635 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1300, loss[loss=0.1036, beats_loss=0.01532, ecapa_loss=0.0002625, whisper_loss=0.08571, over 17674.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01277, ecapa_loss=0.000332, whisper_loss=0.1009, over 3846167.81 frames. ], batch size: 67, lr: 2.89e-02, grad_scale: 4096.0 2024-08-09 19:25:43,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=157980.0, ans=0.125 2024-08-09 19:25:44,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.862e+01 3.141e+01 3.804e+01 7.057e+01, threshold=6.283e+01, percent-clipped=1.0 2024-08-09 19:25:46,990 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 19:26:03,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=158080.0, ans=0.0 2024-08-09 19:26:03,594 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.16 vs. limit=15.0 2024-08-09 19:26:12,036 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 27 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 19:26:15,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=158180.0, ans=0.125 2024-08-09 19:26:16,219 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-09 19:26:27,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=158280.0, ans=0.0 2024-08-09 19:26:29,987 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.91 vs. limit=10.0 2024-08-09 19:26:32,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=158280.0, ans=0.1 2024-08-09 19:26:46,940 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2024-08-09 19:26:47,421 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1350, loss[loss=0.1021, beats_loss=0.01508, ecapa_loss=0.0002815, whisper_loss=0.08422, over 17457.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01282, ecapa_loss=0.0003298, whisper_loss=0.1005, over 3839399.53 frames. ], batch size: 70, lr: 2.89e-02, grad_scale: 4096.0 2024-08-09 19:26:51,622 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-09 19:27:06,218 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 19:27:16,022 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=12.0 2024-08-09 19:27:17,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=158680.0, ans=0.125 2024-08-09 19:27:20,912 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 19 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 19:27:22,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=158680.0, ans=0.125 2024-08-09 19:27:28,755 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 24 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-09 19:27:50,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=158880.0, ans=0.125 2024-08-09 19:27:53,856 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1400, loss[loss=0.1041, beats_loss=0.01495, ecapa_loss=0.0002704, whisper_loss=0.08648, over 15185.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01278, ecapa_loss=0.0003293, whisper_loss=0.1012, over 3831033.01 frames. ], batch size: 59, lr: 2.88e-02, grad_scale: 4096.0 2024-08-09 19:27:56,783 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.826e+01 3.197e+01 3.856e+01 5.556e+01, threshold=6.395e+01, percent-clipped=0.0 2024-08-09 19:27:59,602 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 25 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 19:28:10,891 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 19:28:11,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=159080.0, ans=0.1 2024-08-09 19:28:19,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=159180.0, ans=0.125 2024-08-09 19:28:21,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=159180.0, ans=0.0 2024-08-09 19:28:26,058 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2024-08-09 19:28:34,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=159280.0, ans=0.0 2024-08-09 19:29:00,241 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1450, loss[loss=0.09722, beats_loss=0.01327, ecapa_loss=0.0003298, whisper_loss=0.08065, over 13291.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.0128, ecapa_loss=0.0003277, whisper_loss=0.1007, over 3809268.14 frames. ], batch size: 54, lr: 2.88e-02, grad_scale: 4096.0 2024-08-09 19:29:26,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=159480.0, ans=0.125 2024-08-09 19:29:28,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=159480.0, ans=0.125 2024-08-09 19:30:01,680 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.22 vs. limit=6.0 2024-08-09 19:30:02,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=159680.0, ans=0.1 2024-08-09 19:30:02,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=159680.0, ans=0.125 2024-08-09 19:30:09,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=159780.0, ans=0.0 2024-08-09 19:30:16,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=159780.0, ans=0.125 2024-08-09 19:30:19,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=159880.0, ans=0.0 2024-08-09 19:30:23,719 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 19:30:34,406 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1500, loss[loss=0.124, beats_loss=0.01114, ecapa_loss=0.000321, whisper_loss=0.1096, over 17811.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01279, ecapa_loss=0.0003268, whisper_loss=0.1008, over 3829067.79 frames. ], batch size: 67, lr: 2.87e-02, grad_scale: 4096.0 2024-08-09 19:30:34,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=159980.0, ans=0.0 2024-08-09 19:30:39,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.965e+01 3.414e+01 4.022e+01 6.981e+01, threshold=6.828e+01, percent-clipped=1.0 2024-08-09 19:30:40,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=159980.0, ans=0.2 2024-08-09 19:30:40,784 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.45 vs. limit=10.0 2024-08-09 19:30:41,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=159980.0, ans=0.125 2024-08-09 19:31:00,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=160080.0, ans=0.125 2024-08-09 19:31:00,994 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2024-08-09 19:31:02,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160080.0, ans=0.1 2024-08-09 19:31:03,555 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.449e-01 2024-08-09 19:31:09,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=160180.0, ans=0.125 2024-08-09 19:31:09,247 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.63 vs. limit=15.0 2024-08-09 19:31:15,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=160180.0, ans=0.0 2024-08-09 19:31:15,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=160180.0, ans=0.0 2024-08-09 19:31:18,805 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 17 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-09 19:31:21,863 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 19:31:28,495 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-09 19:31:45,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=160380.0, ans=0.125 2024-08-09 19:31:54,046 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1550, loss[loss=0.1346, beats_loss=0.01165, ecapa_loss=0.00034, whisper_loss=0.1196, over 16962.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01269, ecapa_loss=0.0003289, whisper_loss=0.1012, over 3784348.79 frames. ], batch size: 65, lr: 2.87e-02, grad_scale: 8192.0 2024-08-09 19:32:04,134 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.90 vs. limit=22.5 2024-08-09 19:32:04,774 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-09 19:32:12,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=160580.0, ans=0.0 2024-08-09 19:32:22,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=160580.0, ans=0.125 2024-08-09 19:32:35,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.16 vs. limit=22.5 2024-08-09 19:32:46,252 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 13 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 19:32:46,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=160780.0, ans=0.0 2024-08-09 19:33:12,144 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1600, loss[loss=0.1319, beats_loss=0.01034, ecapa_loss=0.0002865, whisper_loss=0.1187, over 23623.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01273, ecapa_loss=0.0003255, whisper_loss=0.1011, over 3796785.60 frames. ], batch size: 88, lr: 2.87e-02, grad_scale: 8192.0 2024-08-09 19:33:14,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=160980.0, ans=0.04949747468305833 2024-08-09 19:33:16,152 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.968e+01 3.450e+01 4.320e+01 7.036e+01, threshold=6.900e+01, percent-clipped=1.0 2024-08-09 19:33:32,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=161080.0, ans=0.2 2024-08-09 19:33:36,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=161080.0, ans=0.2 2024-08-09 19:34:01,739 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-09 19:34:06,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=161280.0, ans=0.125 2024-08-09 19:34:09,975 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2024-08-09 19:34:30,009 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1650, loss[loss=0.09934, beats_loss=0.01398, ecapa_loss=0.0003001, whisper_loss=0.08236, over 17573.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01274, ecapa_loss=0.000325, whisper_loss=0.1014, over 3792164.54 frames. ], batch size: 69, lr: 2.86e-02, grad_scale: 8192.0 2024-08-09 19:34:47,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=161580.0, ans=0.1 2024-08-09 19:34:57,721 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-09 19:35:06,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.30 vs. limit=15.0 2024-08-09 19:35:08,127 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 30 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 19:35:27,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=161780.0, ans=0.125 2024-08-09 19:35:29,580 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.205e-01 2024-08-09 19:35:29,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=161880.0, ans=0.125 2024-08-09 19:35:35,840 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2024-08-09 19:35:38,166 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-09 19:35:40,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=161880.0, ans=0.0 2024-08-09 19:35:40,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=161880.0, ans=0.125 2024-08-09 19:35:45,576 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1700, loss[loss=0.1123, beats_loss=0.01343, ecapa_loss=0.0003152, whisper_loss=0.09576, over 15431.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01269, ecapa_loss=0.0003236, whisper_loss=0.1016, over 3800195.12 frames. ], batch size: 63, lr: 2.86e-02, grad_scale: 8192.0 2024-08-09 19:35:48,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.753e+01 3.153e+01 3.657e+01 6.641e+01, threshold=6.306e+01, percent-clipped=0.0 2024-08-09 19:36:08,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=162080.0, ans=0.125 2024-08-09 19:36:20,051 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-09 19:36:36,330 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=22.5 2024-08-09 19:36:44,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=162380.0, ans=0.0 2024-08-09 19:36:55,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=162380.0, ans=0.0 2024-08-09 19:36:59,861 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1750, loss[loss=0.1073, beats_loss=0.01149, ecapa_loss=0.0003079, whisper_loss=0.09269, over 14623.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01273, ecapa_loss=0.0003248, whisper_loss=0.1015, over 3807085.96 frames. ], batch size: 54, lr: 2.86e-02, grad_scale: 8192.0 2024-08-09 19:37:02,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=162480.0, ans=0.0 2024-08-09 19:37:12,889 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 19:37:16,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2024-08-09 19:37:56,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2024-08-09 19:38:02,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=162880.0, ans=0.0 2024-08-09 19:38:13,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=162880.0, ans=0.125 2024-08-09 19:38:15,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=162980.0, ans=0.1 2024-08-09 19:38:16,213 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1800, loss[loss=0.09045, beats_loss=0.01312, ecapa_loss=0.00027, whisper_loss=0.07463, over 18672.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01272, ecapa_loss=0.0003271, whisper_loss=0.1013, over 3810254.07 frames. ], batch size: 72, lr: 2.85e-02, grad_scale: 8192.0 2024-08-09 19:38:18,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=162980.0, ans=0.0 2024-08-09 19:38:18,981 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.272e+01 2.809e+01 3.330e+01 3.752e+01 6.796e+01, threshold=6.661e+01, percent-clipped=1.0 2024-08-09 19:38:19,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=162980.0, ans=0.125 2024-08-09 19:38:25,071 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 19:38:28,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=162980.0, ans=0.125 2024-08-09 19:38:30,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=163080.0, ans=0.125 2024-08-09 19:38:37,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=163080.0, ans=0.125 2024-08-09 19:38:41,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=163080.0, ans=0.125 2024-08-09 19:38:43,614 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-09 19:38:49,558 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-09 19:39:02,256 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 19:39:21,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=163380.0, ans=0.1 2024-08-09 19:39:27,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-09 19:39:31,152 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1850, loss[loss=0.1222, beats_loss=0.01352, ecapa_loss=0.0002879, whisper_loss=0.1058, over 22483.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01263, ecapa_loss=0.0003308, whisper_loss=0.1022, over 3809095.01 frames. ], batch size: 87, lr: 2.85e-02, grad_scale: 8192.0 2024-08-09 19:39:34,111 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 19:39:38,611 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-09 19:40:14,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=163780.0, ans=0.125 2024-08-09 19:40:14,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=163780.0, ans=0.2 2024-08-09 19:40:32,481 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 19:40:38,520 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2024-08-09 19:40:38,808 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.49 vs. limit=6.0 2024-08-09 19:40:40,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.12 vs. limit=15.0 2024-08-09 19:40:42,902 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1900, loss[loss=0.1138, beats_loss=0.013, ecapa_loss=0.0003216, whisper_loss=0.09758, over 18268.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01267, ecapa_loss=0.0003386, whisper_loss=0.1019, over 3786420.49 frames. ], batch size: 74, lr: 2.85e-02, grad_scale: 8192.0 2024-08-09 19:40:45,634 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.888e+01 3.200e+01 3.675e+01 7.363e+01, threshold=6.401e+01, percent-clipped=1.0 2024-08-09 19:40:47,940 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.29 vs. limit=15.0 2024-08-09 19:40:53,529 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-09 19:40:58,913 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-09 19:41:03,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=164080.0, ans=0.125 2024-08-09 19:41:07,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=164080.0, ans=0.0 2024-08-09 19:41:15,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=164180.0, ans=0.125 2024-08-09 19:41:22,586 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-09 19:41:28,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=164280.0, ans=0.2 2024-08-09 19:41:30,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=15.0 2024-08-09 19:41:32,306 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 20 from LS+wenet, 31 from Vox, 44 fro AS 2024-08-09 19:41:36,864 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-08-09 19:41:49,396 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 1950, loss[loss=0.1088, beats_loss=0.01258, ecapa_loss=0.0004097, whisper_loss=0.09213, over 22750.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01273, ecapa_loss=0.0003431, whisper_loss=0.1013, over 3778901.70 frames. ], batch size: 93, lr: 2.84e-02, grad_scale: 8192.0 2024-08-09 19:41:52,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=164480.0, ans=0.125 2024-08-09 19:41:56,232 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 19:41:59,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.01 vs. limit=22.5 2024-08-09 19:42:15,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=164680.0, ans=0.125 2024-08-09 19:42:25,945 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.72 vs. limit=10.0 2024-08-09 19:42:31,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164780.0, ans=0.1 2024-08-09 19:42:32,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=164780.0, ans=0.02 2024-08-09 19:42:36,377 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 19:42:42,772 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-09 19:42:46,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=164880.0, ans=0.0 2024-08-09 19:42:55,706 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2000, loss[loss=0.1065, beats_loss=0.01555, ecapa_loss=0.0004425, whisper_loss=0.08648, over 20591.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01275, ecapa_loss=0.0003458, whisper_loss=0.1017, over 3788481.99 frames. ], batch size: 90, lr: 2.84e-02, grad_scale: 8192.0 2024-08-09 19:42:58,187 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.959e+01 3.174e+01 3.680e+01 5.777e+01, threshold=6.348e+01, percent-clipped=0.0 2024-08-09 19:43:03,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=164980.0, ans=0.125 2024-08-09 19:43:07,349 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-09 19:43:11,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=165080.0, ans=0.125 2024-08-09 19:43:41,273 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.59 vs. limit=22.5 2024-08-09 19:43:49,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=165380.0, ans=0.1 2024-08-09 19:43:52,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=165380.0, ans=0.09899494936611666 2024-08-09 19:43:58,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=165380.0, ans=0.2 2024-08-09 19:44:01,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.87 vs. limit=6.0 2024-08-09 19:44:01,610 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2050, loss[loss=0.1014, beats_loss=0.01426, ecapa_loss=0.0003736, whisper_loss=0.08336, over 22978.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01281, ecapa_loss=0.0003504, whisper_loss=0.1014, over 3806594.45 frames. ], batch size: 93, lr: 2.84e-02, grad_scale: 8192.0 2024-08-09 19:44:17,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=165580.0, ans=0.125 2024-08-09 19:44:29,575 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.44 vs. limit=10.0 2024-08-09 19:44:32,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=165680.0, ans=0.125 2024-08-09 19:44:37,255 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-09 19:44:46,123 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 13 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 19:44:46,719 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-09 19:44:54,057 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:44:56,875 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-09 19:44:59,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=165880.0, ans=0.0 2024-08-09 19:45:02,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=165880.0, ans=0.125 2024-08-09 19:45:05,827 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 19:45:06,780 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2100, loss[loss=0.1232, beats_loss=0.01079, ecapa_loss=0.000391, whisper_loss=0.1085, over 22154.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01281, ecapa_loss=0.0003513, whisper_loss=0.1013, over 3788976.75 frames. ], batch size: 89, lr: 2.83e-02, grad_scale: 8192.0 2024-08-09 19:45:08,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=165980.0, ans=0.0 2024-08-09 19:45:09,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.923e+01 3.262e+01 4.036e+01 6.421e+01, threshold=6.525e+01, percent-clipped=1.0 2024-08-09 19:45:30,254 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2024-08-09 19:45:32,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=166180.0, ans=0.125 2024-08-09 19:45:37,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=166180.0, ans=0.125 2024-08-09 19:45:39,088 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 19:45:44,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=166180.0, ans=0.125 2024-08-09 19:45:50,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=166280.0, ans=0.125 2024-08-09 19:45:53,150 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-09 19:45:55,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.62 vs. limit=15.0 2024-08-09 19:45:57,622 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.09 vs. limit=22.5 2024-08-09 19:45:58,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=166380.0, ans=0.1 2024-08-09 19:45:59,148 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.90 vs. limit=15.0 2024-08-09 19:46:12,760 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2150, loss[loss=0.1178, beats_loss=0.01421, ecapa_loss=0.0003641, whisper_loss=0.0999, over 15516.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01284, ecapa_loss=0.0003515, whisper_loss=0.1012, over 3752534.55 frames. ], batch size: 63, lr: 2.83e-02, grad_scale: 8192.0 2024-08-09 19:46:14,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=166480.0, ans=0.125 2024-08-09 19:46:25,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=166580.0, ans=0.125 2024-08-09 19:46:39,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=166680.0, ans=0.125 2024-08-09 19:46:53,364 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 19:47:04,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=166880.0, ans=0.2 2024-08-09 19:47:10,260 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-09 19:47:14,581 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 19:47:18,247 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2200, loss[loss=0.1215, beats_loss=0.0149, ecapa_loss=0.0002659, whisper_loss=0.1039, over 20722.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01275, ecapa_loss=0.0003555, whisper_loss=0.1017, over 3764546.98 frames. ], batch size: 80, lr: 2.82e-02, grad_scale: 8192.0 2024-08-09 19:47:21,061 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.890e+01 3.143e+01 3.810e+01 5.998e+01, threshold=6.286e+01, percent-clipped=0.0 2024-08-09 19:47:47,218 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-09 19:47:54,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=167180.0, ans=0.0 2024-08-09 19:47:55,545 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:48:02,215 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-09 19:48:10,092 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.560e+03 2024-08-09 19:48:20,552 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.892e+00 2024-08-09 19:48:23,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=167480.0, ans=0.125 2024-08-09 19:48:23,885 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2250, loss[loss=0.1394, beats_loss=0.01319, ecapa_loss=0.0003722, whisper_loss=0.1225, over 22423.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01289, ecapa_loss=0.0003541, whisper_loss=0.1015, over 3819596.70 frames. ], batch size: 91, lr: 2.82e-02, grad_scale: 8192.0 2024-08-09 19:48:29,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=167480.0, ans=0.95 2024-08-09 19:48:42,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=167580.0, ans=0.125 2024-08-09 19:48:43,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=167580.0, ans=0.125 2024-08-09 19:48:44,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=167580.0, ans=0.0 2024-08-09 19:48:48,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=167680.0, ans=0.07 2024-08-09 19:48:59,010 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 19:49:00,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=167680.0, ans=0.125 2024-08-09 19:49:28,359 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2300, loss[loss=0.1388, beats_loss=0.01179, ecapa_loss=0.000332, whisper_loss=0.1237, over 22816.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01289, ecapa_loss=0.0003528, whisper_loss=0.1014, over 3838800.93 frames. ], batch size: 89, lr: 2.82e-02, grad_scale: 8192.0 2024-08-09 19:49:31,217 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 3.098e+01 3.355e+01 3.897e+01 6.798e+01, threshold=6.710e+01, percent-clipped=2.0 2024-08-09 19:49:33,408 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.58 vs. limit=22.5 2024-08-09 19:49:43,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=168080.0, ans=0.125 2024-08-09 19:49:46,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=168080.0, ans=0.1 2024-08-09 19:49:47,046 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 25 from LS+wenet, 9 from Vox, 22 fro AS 2024-08-09 19:50:34,839 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2350, loss[loss=0.1299, beats_loss=0.01132, ecapa_loss=0.0003851, whisper_loss=0.1147, over 23510.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01283, ecapa_loss=0.0003538, whisper_loss=0.1019, over 3874459.01 frames. ], batch size: 94, lr: 2.81e-02, grad_scale: 8192.0 2024-08-09 19:50:40,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=168480.0, ans=0.0 2024-08-09 19:50:43,009 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 19:51:05,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=168680.0, ans=0.1 2024-08-09 19:51:29,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=168880.0, ans=0.1 2024-08-09 19:51:33,242 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 19:51:33,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=168880.0, ans=0.125 2024-08-09 19:51:34,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=168880.0, ans=0.2 2024-08-09 19:51:43,349 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2400, loss[loss=0.1183, beats_loss=0.01387, ecapa_loss=0.0003314, whisper_loss=0.1011, over 22567.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01277, ecapa_loss=0.0003545, whisper_loss=0.1021, over 3863370.72 frames. ], batch size: 90, lr: 2.81e-02, grad_scale: 8192.0 2024-08-09 19:51:46,040 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.941e+01 3.344e+01 3.819e+01 6.517e+01, threshold=6.689e+01, percent-clipped=0.0 2024-08-09 19:51:48,757 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 19:51:54,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=168980.0, ans=0.1 2024-08-09 19:51:56,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=169080.0, ans=0.0 2024-08-09 19:51:57,148 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.04 vs. limit=15.0 2024-08-09 19:52:00,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169080.0, ans=0.1 2024-08-09 19:52:05,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169080.0, ans=0.1 2024-08-09 19:52:23,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=169280.0, ans=0.0 2024-08-09 19:52:24,903 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-09 19:52:31,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=169280.0, ans=0.125 2024-08-09 19:52:37,292 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-09 19:52:50,797 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2450, loss[loss=0.1378, beats_loss=0.011, ecapa_loss=0.0003478, whisper_loss=0.1234, over 23384.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01278, ecapa_loss=0.0003531, whisper_loss=0.1019, over 3842445.09 frames. ], batch size: 89, lr: 2.81e-02, grad_scale: 8192.0 2024-08-09 19:52:51,565 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=22.5 2024-08-09 19:52:54,203 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.720e+02 2024-08-09 19:53:02,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=169480.0, ans=0.0 2024-08-09 19:53:20,406 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2024-08-09 19:53:24,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=169680.0, ans=0.125 2024-08-09 19:53:25,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.66 vs. limit=15.0 2024-08-09 19:53:37,816 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=15.0 2024-08-09 19:53:53,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=169880.0, ans=0.125 2024-08-09 19:54:00,414 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2500, loss[loss=0.111, beats_loss=0.01341, ecapa_loss=0.0003503, whisper_loss=0.09405, over 16796.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01259, ecapa_loss=0.0003573, whisper_loss=0.1026, over 3825347.10 frames. ], batch size: 67, lr: 2.80e-02, grad_scale: 8192.0 2024-08-09 19:54:03,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.848e+01 3.405e+01 3.928e+01 5.880e+01, threshold=6.809e+01, percent-clipped=0.0 2024-08-09 19:54:13,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=170080.0, ans=0.0 2024-08-09 19:54:22,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=170080.0, ans=0.125 2024-08-09 19:54:31,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=170180.0, ans=0.0 2024-08-09 19:54:59,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=170380.0, ans=0.1 2024-08-09 19:55:01,939 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 35 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 19:55:12,203 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2550, loss[loss=0.1124, beats_loss=0.01543, ecapa_loss=0.0003448, whisper_loss=0.09352, over 20910.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01259, ecapa_loss=0.0003563, whisper_loss=0.1025, over 3835867.91 frames. ], batch size: 87, lr: 2.80e-02, grad_scale: 8192.0 2024-08-09 19:55:12,933 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-09 19:55:35,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=170580.0, ans=0.5 2024-08-09 19:55:40,420 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 11 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 19:55:43,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=170680.0, ans=0.0 2024-08-09 19:55:46,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=170680.0, ans=0.125 2024-08-09 19:56:01,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=170780.0, ans=0.0 2024-08-09 19:56:01,661 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.63 vs. limit=22.5 2024-08-09 19:56:06,390 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 41 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 19:56:08,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=170780.0, ans=0.0 2024-08-09 19:56:17,210 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2024-08-09 19:56:24,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=170880.0, ans=0.0 2024-08-09 19:56:26,446 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2600, loss[loss=0.1146, beats_loss=0.01506, ecapa_loss=0.0003242, whisper_loss=0.09632, over 21193.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.0127, ecapa_loss=0.0003545, whisper_loss=0.102, over 3841722.82 frames. ], batch size: 84, lr: 2.80e-02, grad_scale: 8192.0 2024-08-09 19:56:29,101 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.96 vs. limit=22.5 2024-08-09 19:56:29,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 3.011e+01 3.512e+01 4.102e+01 7.361e+01, threshold=7.024e+01, percent-clipped=2.0 2024-08-09 19:56:30,809 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-09 19:56:45,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=171080.0, ans=0.2 2024-08-09 19:57:18,955 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:57:21,092 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-09 19:57:36,875 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2650, loss[loss=0.09288, beats_loss=0.01332, ecapa_loss=0.0003725, whisper_loss=0.07583, over 19994.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01278, ecapa_loss=0.0003525, whisper_loss=0.1024, over 3889542.08 frames. ], batch size: 83, lr: 2.79e-02, grad_scale: 8192.0 2024-08-09 19:57:41,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=171480.0, ans=0.035 2024-08-09 19:58:01,653 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 19:58:05,625 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-09 19:58:11,875 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.784e-02 2024-08-09 19:58:15,753 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 19:58:17,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=171680.0, ans=0.07 2024-08-09 19:58:41,359 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.81 vs. limit=15.0 2024-08-09 19:58:48,246 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2700, loss[loss=0.1294, beats_loss=0.01191, ecapa_loss=0.0004021, whisper_loss=0.1135, over 21185.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01281, ecapa_loss=0.0003539, whisper_loss=0.102, over 3875500.97 frames. ], batch size: 85, lr: 2.79e-02, grad_scale: 8192.0 2024-08-09 19:58:48,501 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 17 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 19:58:51,074 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.909e+01 3.335e+01 3.725e+01 7.583e+01, threshold=6.671e+01, percent-clipped=1.0 2024-08-09 19:58:55,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=171980.0, ans=0.07 2024-08-09 19:59:01,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=172080.0, ans=0.0 2024-08-09 19:59:06,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=172080.0, ans=0.125 2024-08-09 19:59:20,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=172180.0, ans=0.125 2024-08-09 19:59:34,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.57 vs. limit=22.5 2024-08-09 19:59:59,010 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2750, loss[loss=0.1234, beats_loss=0.01252, ecapa_loss=0.0004126, whisper_loss=0.1068, over 18885.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.0128, ecapa_loss=0.000352, whisper_loss=0.1021, over 3893454.88 frames. ], batch size: 79, lr: 2.79e-02, grad_scale: 8192.0 2024-08-09 20:00:55,951 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-09 20:00:56,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=172880.0, ans=0.2 2024-08-09 20:00:59,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=172880.0, ans=0.2 2024-08-09 20:01:12,569 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2800, loss[loss=0.124, beats_loss=0.009438, ecapa_loss=0.0004804, whisper_loss=0.1097, over 19198.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.0127, ecapa_loss=0.0003536, whisper_loss=0.1021, over 3883914.37 frames. ], batch size: 80, lr: 2.78e-02, grad_scale: 8192.0 2024-08-09 20:01:15,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 3.001e+01 3.485e+01 3.958e+01 7.033e+01, threshold=6.969e+01, percent-clipped=2.0 2024-08-09 20:01:15,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=172980.0, ans=0.1 2024-08-09 20:01:37,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=173080.0, ans=0.125 2024-08-09 20:01:42,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=173180.0, ans=0.1 2024-08-09 20:01:50,718 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2024-08-09 20:02:15,795 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-09 20:02:20,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=173380.0, ans=0.2 2024-08-09 20:02:23,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=173480.0, ans=0.1 2024-08-09 20:02:24,015 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2850, loss[loss=0.1127, beats_loss=0.01439, ecapa_loss=0.0003363, whisper_loss=0.09492, over 18472.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01277, ecapa_loss=0.0003525, whisper_loss=0.1017, over 3875937.77 frames. ], batch size: 79, lr: 2.78e-02, grad_scale: 8192.0 2024-08-09 20:02:24,239 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 18 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 20:02:30,923 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-09 20:02:35,410 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.591e-02 2024-08-09 20:03:08,942 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.98 vs. limit=15.0 2024-08-09 20:03:14,420 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 20:03:36,878 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2900, loss[loss=0.1017, beats_loss=0.01359, ecapa_loss=0.0003855, whisper_loss=0.08425, over 14960.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01291, ecapa_loss=0.0003543, whisper_loss=0.1013, over 3876460.81 frames. ], batch size: 62, lr: 2.78e-02, grad_scale: 8192.0 2024-08-09 20:03:40,013 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 3.065e+01 3.431e+01 3.879e+01 6.098e+01, threshold=6.862e+01, percent-clipped=0.0 2024-08-09 20:03:43,879 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 20:03:46,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=173980.0, ans=0.0 2024-08-09 20:03:51,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=174080.0, ans=0.0 2024-08-09 20:04:05,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=174180.0, ans=0.125 2024-08-09 20:04:09,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=174180.0, ans=0.0 2024-08-09 20:04:19,122 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.72 vs. limit=10.0 2024-08-09 20:04:30,837 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-09 20:04:44,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=174380.0, ans=0.125 2024-08-09 20:04:45,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=174380.0, ans=0.1 2024-08-09 20:04:47,889 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 2950, loss[loss=0.1289, beats_loss=0.01329, ecapa_loss=0.0003276, whisper_loss=0.1124, over 23065.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01295, ecapa_loss=0.0003535, whisper_loss=0.1012, over 3871829.74 frames. ], batch size: 93, lr: 2.77e-02, grad_scale: 8192.0 2024-08-09 20:04:50,522 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 20:04:53,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=174480.0, ans=0.125 2024-08-09 20:04:59,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174480.0, ans=0.1 2024-08-09 20:05:19,425 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 39 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 20:05:38,073 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 20:05:43,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-09 20:06:14,471 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3000, loss[loss=0.1404, beats_loss=0.01254, ecapa_loss=0.000358, whisper_loss=0.1243, over 23559.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01286, ecapa_loss=0.0003536, whisper_loss=0.1024, over 3900525.34 frames. ], batch size: 92, lr: 2.77e-02, grad_scale: 8192.0 2024-08-09 20:06:14,472 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-09 20:06:58,511 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on ASR_libri: loss=0.2837, beats_loss=0, ecapa_loss=0.001014, whisper_loss=0.2736, over 922467.00 frames. 2024-08-09 20:07:17,193 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on SV_voxceleb1: loss=0.009278, beats_loss=0, ecapa_loss=0.0009278, whisper_loss=0, over 939242.00 frames. 2024-08-09 20:08:50,852 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on AT_audioset: loss=0.03024, beats_loss=0.03024, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 20:08:50,855 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-09 20:08:53,413 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 2.977e+01 3.430e+01 4.027e+01 7.550e+01, threshold=6.860e+01, percent-clipped=3.0 2024-08-09 20:08:53,606 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 20:09:39,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.44 vs. limit=22.5 2024-08-09 20:09:45,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=175280.0, ans=0.0 2024-08-09 20:09:47,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=175280.0, ans=0.2 2024-08-09 20:10:28,839 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3050, loss[loss=0.1154, beats_loss=0.01506, ecapa_loss=0.000327, whisper_loss=0.09707, over 23151.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01282, ecapa_loss=0.0003553, whisper_loss=0.1027, over 3868760.01 frames. ], batch size: 92, lr: 2.77e-02, grad_scale: 8192.0 2024-08-09 20:10:48,085 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=12.0 2024-08-09 20:11:08,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=175580.0, ans=0.125 2024-08-09 20:11:30,865 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2024-08-09 20:11:32,105 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 20:11:45,579 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.44 vs. limit=15.0 2024-08-09 20:12:10,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=175880.0, ans=0.125 2024-08-09 20:12:21,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3100, loss[loss=0.1061, beats_loss=0.01463, ecapa_loss=0.0004315, whisper_loss=0.08716, over 21539.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01283, ecapa_loss=0.0003553, whisper_loss=0.1022, over 3839210.15 frames. ], batch size: 93, lr: 2.76e-02, grad_scale: 8192.0 2024-08-09 20:12:25,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 3.112e+01 3.600e+01 4.119e+01 8.540e+01, threshold=7.200e+01, percent-clipped=4.0 2024-08-09 20:12:39,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=175980.0, ans=0.0 2024-08-09 20:12:39,897 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.31 vs. limit=22.5 2024-08-09 20:12:54,126 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-09 20:13:04,209 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 20:13:07,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=176180.0, ans=0.125 2024-08-09 20:13:13,783 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-09 20:13:13,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=176180.0, ans=0.95 2024-08-09 20:13:17,178 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 20:13:23,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176280.0, ans=0.1 2024-08-09 20:13:25,409 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.941e+03 2024-08-09 20:14:02,677 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2024-08-09 20:14:08,170 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3150, loss[loss=0.1139, beats_loss=0.01265, ecapa_loss=0.0004471, whisper_loss=0.09674, over 18509.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01278, ecapa_loss=0.0003552, whisper_loss=0.1021, over 3852992.65 frames. ], batch size: 82, lr: 2.76e-02, grad_scale: 8192.0 2024-08-09 20:14:23,329 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 20:14:43,916 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 20:14:47,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176580.0, ans=0.1 2024-08-09 20:15:08,464 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 20:15:13,973 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 20:15:21,178 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.17 vs. limit=6.0 2024-08-09 20:15:46,476 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3200, loss[loss=0.14, beats_loss=0.008849, ecapa_loss=0.0003243, whisper_loss=0.1279, over 17080.00 frames. ], tot_loss[loss=0.1193, beats_loss=0.01266, ecapa_loss=0.0003553, whisper_loss=0.1031, over 3852317.57 frames. ], batch size: 65, lr: 2.76e-02, grad_scale: 8192.0 2024-08-09 20:15:48,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=176980.0, ans=0.125 2024-08-09 20:15:49,059 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.848e+01 3.292e+01 3.822e+01 6.429e+01, threshold=6.585e+01, percent-clipped=0.0 2024-08-09 20:15:54,659 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.74 vs. limit=22.5 2024-08-09 20:16:15,347 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-09 20:16:15,875 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2024-08-09 20:16:18,206 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-09 20:16:38,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=177280.0, ans=0.0 2024-08-09 20:16:55,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=177380.0, ans=0.2 2024-08-09 20:17:00,879 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3250, loss[loss=0.1414, beats_loss=0.009663, ecapa_loss=0.0004284, whisper_loss=0.1275, over 19797.00 frames. ], tot_loss[loss=0.1195, beats_loss=0.01266, ecapa_loss=0.000355, whisper_loss=0.1033, over 3879860.38 frames. ], batch size: 76, lr: 2.75e-02, grad_scale: 8192.0 2024-08-09 20:17:07,437 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 27 from LS+wenet, 8 from Vox, 30 fro AS 2024-08-09 20:17:10,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=177480.0, ans=0.125 2024-08-09 20:17:34,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=177680.0, ans=0.0 2024-08-09 20:17:38,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=177680.0, ans=0.07 2024-08-09 20:17:49,778 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=22.5 2024-08-09 20:17:55,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=177780.0, ans=0.0 2024-08-09 20:18:02,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=177880.0, ans=0.125 2024-08-09 20:18:07,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=177880.0, ans=0.0 2024-08-09 20:18:07,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=177880.0, ans=0.125 2024-08-09 20:18:14,341 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3300, loss[loss=0.1146, beats_loss=0.01298, ecapa_loss=0.0003176, whisper_loss=0.0984, over 19710.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01269, ecapa_loss=0.0003536, whisper_loss=0.1034, over 3895181.14 frames. ], batch size: 77, lr: 2.75e-02, grad_scale: 8192.0 2024-08-09 20:18:18,072 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 3.080e+01 3.504e+01 4.263e+01 7.840e+01, threshold=7.009e+01, percent-clipped=4.0 2024-08-09 20:18:19,875 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-09 20:18:34,801 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 12 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-09 20:18:39,009 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 33 from Vox, 25 fro AS 2024-08-09 20:18:44,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=178180.0, ans=0.125 2024-08-09 20:18:49,171 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-09 20:18:50,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=178180.0, ans=0.125 2024-08-09 20:19:00,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=178280.0, ans=0.0 2024-08-09 20:19:05,163 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.57 vs. limit=15.0 2024-08-09 20:19:26,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=15.0 2024-08-09 20:19:28,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=178380.0, ans=0.125 2024-08-09 20:19:28,719 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.39 vs. limit=15.0 2024-08-09 20:19:29,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=178380.0, ans=0.125 2024-08-09 20:19:32,838 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 20:19:36,097 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3350, loss[loss=0.117, beats_loss=0.01313, ecapa_loss=0.0003469, whisper_loss=0.1003, over 14839.00 frames. ], tot_loss[loss=0.1194, beats_loss=0.01269, ecapa_loss=0.000353, whisper_loss=0.1032, over 3887095.35 frames. ], batch size: 58, lr: 2.75e-02, grad_scale: 8192.0 2024-08-09 20:19:43,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=178480.0, ans=0.2 2024-08-09 20:19:43,869 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2024-08-09 20:19:54,644 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.150e-02 2024-08-09 20:19:59,097 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.50 vs. limit=22.5 2024-08-09 20:20:12,733 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 20:20:23,743 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 20:20:25,142 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 20:20:33,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=178780.0, ans=0.0 2024-08-09 20:20:49,742 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 20:20:50,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=178880.0, ans=0.125 2024-08-09 20:20:58,055 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3400, loss[loss=0.1285, beats_loss=0.01013, ecapa_loss=0.0003623, whisper_loss=0.1147, over 22882.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01272, ecapa_loss=0.0003518, whisper_loss=0.1027, over 3880280.67 frames. ], batch size: 90, lr: 2.74e-02, grad_scale: 8192.0 2024-08-09 20:21:00,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 2.994e+01 3.327e+01 4.294e+01 6.950e+01, threshold=6.654e+01, percent-clipped=0.0 2024-08-09 20:21:36,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=179180.0, ans=0.125 2024-08-09 20:21:37,837 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 8 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 20:21:49,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=179280.0, ans=0.125 2024-08-09 20:22:21,729 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3450, loss[loss=0.1142, beats_loss=0.0147, ecapa_loss=0.0003113, whisper_loss=0.09637, over 23356.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01267, ecapa_loss=0.0003516, whisper_loss=0.1029, over 3899572.15 frames. ], batch size: 94, lr: 2.74e-02, grad_scale: 8192.0 2024-08-09 20:22:24,845 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 28 from Vox, 20 fro AS 2024-08-09 20:22:34,152 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-09 20:22:55,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=179680.0, ans=0.125 2024-08-09 20:23:13,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=179780.0, ans=0.2 2024-08-09 20:23:33,210 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=12.0 2024-08-09 20:23:43,974 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3500, loss[loss=0.1162, beats_loss=0.01386, ecapa_loss=0.0003482, whisper_loss=0.09889, over 23181.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01267, ecapa_loss=0.0003489, whisper_loss=0.1028, over 3892390.48 frames. ], batch size: 94, lr: 2.74e-02, grad_scale: 8192.0 2024-08-09 20:23:47,154 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.966e+01 3.324e+01 3.987e+01 6.193e+01, threshold=6.648e+01, percent-clipped=0.0 2024-08-09 20:23:57,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=179980.0, ans=0.95 2024-08-09 20:23:57,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=179980.0, ans=0.125 2024-08-09 20:24:06,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=180080.0, ans=0.125 2024-08-09 20:24:11,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=180080.0, ans=12.0 2024-08-09 20:24:12,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=180080.0, ans=10.0 2024-08-09 20:24:14,655 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.43 vs. limit=10.0 2024-08-09 20:24:30,384 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 24 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-09 20:24:32,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=180280.0, ans=0.0 2024-08-09 20:24:32,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180280.0, ans=0.1 2024-08-09 20:24:39,696 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 20:25:08,173 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3550, loss[loss=0.1282, beats_loss=0.01315, ecapa_loss=0.0003121, whisper_loss=0.1119, over 22022.00 frames. ], tot_loss[loss=0.1189, beats_loss=0.01264, ecapa_loss=0.0003503, whisper_loss=0.1028, over 3911779.27 frames. ], batch size: 89, lr: 2.73e-02, grad_scale: 16384.0 2024-08-09 20:25:09,683 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 14 from LS+wenet, 11 from Vox, 42 fro AS 2024-08-09 20:25:19,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180480.0, ans=0.1 2024-08-09 20:25:26,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=180580.0, ans=0.125 2024-08-09 20:25:46,038 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 20:26:01,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=180780.0, ans=0.125 2024-08-09 20:26:27,805 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-09 20:26:35,296 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3600, loss[loss=0.124, beats_loss=0.01197, ecapa_loss=0.000378, whisper_loss=0.1083, over 22301.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01263, ecapa_loss=0.0003495, whisper_loss=0.1029, over 3907977.07 frames. ], batch size: 91, lr: 2.73e-02, grad_scale: 16384.0 2024-08-09 20:26:38,495 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+01 2.970e+01 3.508e+01 4.140e+01 6.583e+01, threshold=7.015e+01, percent-clipped=0.0 2024-08-09 20:26:45,517 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-09 20:26:56,058 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-09 20:26:56,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=181080.0, ans=0.0 2024-08-09 20:26:58,346 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2024-08-09 20:27:02,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=181080.0, ans=0.0 2024-08-09 20:27:05,382 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 20:27:14,735 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 20:27:18,258 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 16 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-09 20:27:21,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=181180.0, ans=0.07 2024-08-09 20:27:29,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=181280.0, ans=0.07 2024-08-09 20:27:30,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.66 vs. limit=15.0 2024-08-09 20:27:37,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=181280.0, ans=0.025 2024-08-09 20:27:49,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=181380.0, ans=0.125 2024-08-09 20:27:56,694 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3650, loss[loss=0.1266, beats_loss=0.01432, ecapa_loss=0.0003651, whisper_loss=0.1086, over 20689.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.0127, ecapa_loss=0.0003491, whisper_loss=0.1023, over 3846243.20 frames. ], batch size: 87, lr: 2.73e-02, grad_scale: 16384.0 2024-08-09 20:27:57,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=181480.0, ans=0.09899494936611666 2024-08-09 20:27:58,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=181480.0, ans=0.125 2024-08-09 20:28:29,851 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.23 vs. limit=22.5 2024-08-09 20:28:37,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181680.0, ans=0.1 2024-08-09 20:28:37,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=181680.0, ans=0.0 2024-08-09 20:28:48,056 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.38 vs. limit=6.0 2024-08-09 20:28:53,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=181780.0, ans=0.0 2024-08-09 20:28:54,766 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-09 20:29:19,146 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3700, loss[loss=0.1368, beats_loss=0.007107, ecapa_loss=0.0003516, whisper_loss=0.1261, over 15978.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01271, ecapa_loss=0.0003503, whisper_loss=0.1022, over 3857470.33 frames. ], batch size: 58, lr: 2.72e-02, grad_scale: 16384.0 2024-08-09 20:29:22,368 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.937e+01 3.354e+01 4.017e+01 7.791e+01, threshold=6.707e+01, percent-clipped=1.0 2024-08-09 20:29:22,981 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2024-08-09 20:29:27,411 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-09 20:30:05,704 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 20:30:15,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=182280.0, ans=0.1 2024-08-09 20:30:39,385 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3750, loss[loss=0.1051, beats_loss=0.01502, ecapa_loss=0.0003481, whisper_loss=0.08657, over 21832.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01285, ecapa_loss=0.0003473, whisper_loss=0.1019, over 3901798.15 frames. ], batch size: 90, lr: 2.72e-02, grad_scale: 16384.0 2024-08-09 20:30:41,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=182480.0, ans=0.125 2024-08-09 20:31:04,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=182580.0, ans=0.0 2024-08-09 20:31:10,943 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2024-08-09 20:31:20,027 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 17 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 20:31:26,222 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-09 20:31:29,805 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 20:31:32,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=182780.0, ans=0.125 2024-08-09 20:31:40,572 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2024-08-09 20:31:42,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=182880.0, ans=0.2 2024-08-09 20:31:58,931 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=12.0 2024-08-09 20:31:59,362 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3800, loss[loss=0.1196, beats_loss=0.01204, ecapa_loss=0.0004057, whisper_loss=0.1035, over 20967.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01289, ecapa_loss=0.0003479, whisper_loss=0.1018, over 3901227.45 frames. ], batch size: 87, lr: 2.72e-02, grad_scale: 16384.0 2024-08-09 20:32:01,762 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.977e+01 3.395e+01 3.964e+01 6.825e+01, threshold=6.789e+01, percent-clipped=1.0 2024-08-09 20:33:16,039 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3850, loss[loss=0.1397, beats_loss=0.01016, ecapa_loss=0.0003742, whisper_loss=0.1258, over 17600.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01289, ecapa_loss=0.0003455, whisper_loss=0.1022, over 3908491.07 frames. ], batch size: 68, lr: 2.71e-02, grad_scale: 16384.0 2024-08-09 20:33:43,058 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-09 20:33:50,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=183680.0, ans=0.0 2024-08-09 20:34:29,064 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 20:34:35,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3900, loss[loss=0.1366, beats_loss=0.01266, ecapa_loss=0.0003382, whisper_loss=0.1206, over 22567.00 frames. ], tot_loss[loss=0.1193, beats_loss=0.01282, ecapa_loss=0.000347, whisper_loss=0.103, over 3928692.77 frames. ], batch size: 88, lr: 2.71e-02, grad_scale: 16384.0 2024-08-09 20:34:37,240 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.880e+02 2024-08-09 20:34:38,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 2.932e+01 3.278e+01 3.846e+01 7.989e+01, threshold=6.556e+01, percent-clipped=2.0 2024-08-09 20:34:45,565 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-09 20:35:05,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=184080.0, ans=0.0 2024-08-09 20:35:25,964 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.637e+00 2024-08-09 20:35:39,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=184380.0, ans=0.04949747468305833 2024-08-09 20:35:44,931 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-09 20:35:50,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=184380.0, ans=0.125 2024-08-09 20:35:52,209 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-09 20:35:56,660 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 3950, loss[loss=0.1017, beats_loss=0.01573, ecapa_loss=0.0002876, whisper_loss=0.08306, over 21275.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01281, ecapa_loss=0.0003464, whisper_loss=0.1036, over 3951839.67 frames. ], batch size: 88, lr: 2.71e-02, grad_scale: 16384.0 2024-08-09 20:36:04,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=184480.0, ans=0.0 2024-08-09 20:36:11,202 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-09 20:36:23,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=184580.0, ans=0.125 2024-08-09 20:36:26,376 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2024-08-09 20:36:32,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=184680.0, ans=0.125 2024-08-09 20:36:34,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=184680.0, ans=0.125 2024-08-09 20:36:46,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=184780.0, ans=0.125 2024-08-09 20:36:57,503 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 20:37:14,778 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4000, loss[loss=0.1373, beats_loss=0.01364, ecapa_loss=0.0003538, whisper_loss=0.1202, over 15133.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01266, ecapa_loss=0.0003484, whisper_loss=0.1044, over 3941034.78 frames. ], batch size: 63, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:37:16,242 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-09 20:37:17,856 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.368e+01 2.965e+01 3.379e+01 3.827e+01 6.548e+01, threshold=6.758e+01, percent-clipped=0.0 2024-08-09 20:37:19,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=184980.0, ans=0.2 2024-08-09 20:37:28,603 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-09 20:37:32,528 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2024-08-09 20:37:43,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=185180.0, ans=0.125 2024-08-09 20:37:44,759 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-09 20:37:47,463 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-09 20:37:58,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=185280.0, ans=0.1 2024-08-09 20:38:09,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=185280.0, ans=0.015 2024-08-09 20:38:21,461 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 20:38:30,369 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4050, loss[loss=0.1323, beats_loss=0.01185, ecapa_loss=0.0002864, whisper_loss=0.1176, over 23099.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01272, ecapa_loss=0.0003463, whisper_loss=0.1039, over 3934610.80 frames. ], batch size: 85, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:38:49,939 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-09 20:38:52,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=185580.0, ans=0.125 2024-08-09 20:39:38,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=185980.0, ans=0.125 2024-08-09 20:39:39,497 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4100, loss[loss=0.1546, beats_loss=0.0116, ecapa_loss=0.0003228, whisper_loss=0.1397, over 23775.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01273, ecapa_loss=0.0003448, whisper_loss=0.1036, over 3932993.88 frames. ], batch size: 91, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:39:42,212 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 3.015e+01 3.336e+01 4.132e+01 1.372e+02, threshold=6.672e+01, percent-clipped=1.0 2024-08-09 20:39:45,025 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-09 20:40:04,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=186080.0, ans=0.125 2024-08-09 20:40:11,799 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 20:40:18,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=186280.0, ans=0.0 2024-08-09 20:40:27,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=186280.0, ans=0.1 2024-08-09 20:40:31,360 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-09 20:40:34,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=186380.0, ans=0.0 2024-08-09 20:40:45,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=186480.0, ans=0.125 2024-08-09 20:40:45,984 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4150, loss[loss=0.1233, beats_loss=0.01353, ecapa_loss=0.0003161, whisper_loss=0.1066, over 23419.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01272, ecapa_loss=0.0003448, whisper_loss=0.1036, over 3930946.38 frames. ], batch size: 91, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:40:53,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=186480.0, ans=0.0 2024-08-09 20:40:59,073 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.67 vs. limit=22.5 2024-08-09 20:41:11,762 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 20:41:34,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=186780.0, ans=0.0 2024-08-09 20:41:39,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=186880.0, ans=0.125 2024-08-09 20:41:45,944 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-09 20:41:52,542 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4200, loss[loss=0.1195, beats_loss=0.01316, ecapa_loss=0.0003892, whisper_loss=0.1024, over 22015.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01273, ecapa_loss=0.0003431, whisper_loss=0.1034, over 3901011.70 frames. ], batch size: 92, lr: 2.69e-02, grad_scale: 16384.0 2024-08-09 20:41:54,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.958e+01 3.347e+01 3.898e+01 6.800e+01, threshold=6.694e+01, percent-clipped=1.0 2024-08-09 20:42:00,564 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-09 20:42:06,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=187080.0, ans=0.125 2024-08-09 20:42:24,586 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2024-08-09 20:42:26,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=187180.0, ans=0.5 2024-08-09 20:42:32,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2024-08-09 20:42:34,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=187280.0, ans=0.125 2024-08-09 20:42:56,895 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 20:42:58,069 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4250, loss[loss=0.1216, beats_loss=0.01308, ecapa_loss=0.0003154, whisper_loss=0.1053, over 22126.00 frames. ], tot_loss[loss=0.1199, beats_loss=0.01267, ecapa_loss=0.0003417, whisper_loss=0.1038, over 3956353.74 frames. ], batch size: 89, lr: 2.69e-02, grad_scale: 16384.0 2024-08-09 20:43:03,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=187480.0, ans=0.0 2024-08-09 20:43:11,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=187580.0, ans=0.125 2024-08-09 20:43:22,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=187580.0, ans=0.2 2024-08-09 20:43:28,081 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 20:43:44,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187780.0, ans=0.1 2024-08-09 20:43:50,307 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.80 vs. limit=10.0 2024-08-09 20:44:03,841 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4300, loss[loss=0.1322, beats_loss=0.01268, ecapa_loss=0.0002874, whisper_loss=0.1167, over 23728.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01268, ecapa_loss=0.0003423, whisper_loss=0.103, over 3908354.00 frames. ], batch size: 90, lr: 2.69e-02, grad_scale: 16384.0 2024-08-09 20:44:06,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.942e+01 3.508e+01 4.302e+01 6.032e+01, threshold=7.016e+01, percent-clipped=0.0 2024-08-09 20:44:12,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=187980.0, ans=0.025 2024-08-09 20:44:19,845 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 20:44:34,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=188180.0, ans=0.125 2024-08-09 20:44:44,908 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 31 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-09 20:44:45,591 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.77 vs. limit=10.0 2024-08-09 20:44:50,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=188280.0, ans=0.125 2024-08-09 20:45:01,572 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.20 vs. limit=15.0 2024-08-09 20:45:02,062 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 20:45:09,664 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4350, loss[loss=0.1071, beats_loss=0.01342, ecapa_loss=0.0003893, whisper_loss=0.08984, over 17101.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01272, ecapa_loss=0.0003455, whisper_loss=0.1019, over 3884771.77 frames. ], batch size: 68, lr: 2.68e-02, grad_scale: 16384.0 2024-08-09 20:45:11,899 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2024-08-09 20:45:15,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=188480.0, ans=0.125 2024-08-09 20:45:21,916 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-09 20:45:27,321 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 20:45:31,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=188580.0, ans=0.125 2024-08-09 20:45:35,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=188680.0, ans=0.125 2024-08-09 20:45:39,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=188680.0, ans=0.1 2024-08-09 20:45:50,927 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-09 20:46:01,227 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-09 20:46:11,279 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-09 20:46:20,438 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4400, loss[loss=0.1177, beats_loss=0.01043, ecapa_loss=0.0004239, whisper_loss=0.1031, over 17730.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01268, ecapa_loss=0.000343, whisper_loss=0.1023, over 3861258.82 frames. ], batch size: 73, lr: 2.68e-02, grad_scale: 16384.0 2024-08-09 20:46:23,478 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.890e+01 3.311e+01 3.807e+01 6.108e+01, threshold=6.622e+01, percent-clipped=0.0 2024-08-09 20:46:27,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=188980.0, ans=0.0 2024-08-09 20:46:39,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=189080.0, ans=0.0 2024-08-09 20:46:56,211 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.032e+01 2024-08-09 20:46:59,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=189180.0, ans=0.125 2024-08-09 20:47:04,778 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-09 20:47:08,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=189280.0, ans=0.125 2024-08-09 20:47:10,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=189280.0, ans=0.125 2024-08-09 20:47:19,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=189280.0, ans=0.125 2024-08-09 20:47:37,174 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-09 20:47:38,358 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4450, loss[loss=0.1221, beats_loss=0.01421, ecapa_loss=0.0003166, whisper_loss=0.1047, over 21369.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01278, ecapa_loss=0.0003409, whisper_loss=0.1018, over 3850187.51 frames. ], batch size: 83, lr: 2.68e-02, grad_scale: 16384.0 2024-08-09 20:47:40,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=189480.0, ans=0.0 2024-08-09 20:47:41,624 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-09 20:47:48,908 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-09 20:47:55,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=189580.0, ans=0.0 2024-08-09 20:48:30,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=189780.0, ans=0.125 2024-08-09 20:48:32,392 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2024-08-09 20:48:35,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=189780.0, ans=0.1 2024-08-09 20:49:02,952 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4500, loss[loss=0.09171, beats_loss=0.01304, ecapa_loss=0.0003426, whisper_loss=0.07524, over 20245.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01275, ecapa_loss=0.0003405, whisper_loss=0.1014, over 3873554.77 frames. ], batch size: 82, lr: 2.67e-02, grad_scale: 16384.0 2024-08-09 20:49:06,606 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.955e+01 3.431e+01 3.879e+01 5.998e+01, threshold=6.863e+01, percent-clipped=0.0 2024-08-09 20:49:33,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=190080.0, ans=0.125 2024-08-09 20:49:41,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=190180.0, ans=0.125 2024-08-09 20:49:45,527 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-09 20:50:06,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=190380.0, ans=0.125 2024-08-09 20:50:24,555 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4550, loss[loss=0.1239, beats_loss=0.01207, ecapa_loss=0.0003476, whisper_loss=0.1084, over 20761.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01269, ecapa_loss=0.0003426, whisper_loss=0.1019, over 3875975.81 frames. ], batch size: 84, lr: 2.67e-02, grad_scale: 16384.0 2024-08-09 20:50:29,345 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 20:50:40,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=190580.0, ans=0.2 2024-08-09 20:50:53,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=190580.0, ans=0.1 2024-08-09 20:50:58,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=190680.0, ans=0.125 2024-08-09 20:51:24,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=190780.0, ans=0.0 2024-08-09 20:51:31,245 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-09 20:51:41,018 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 20:51:45,643 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4600, loss[loss=0.1081, beats_loss=0.01249, ecapa_loss=0.0003189, whisper_loss=0.09239, over 18658.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.0127, ecapa_loss=0.000343, whisper_loss=0.102, over 3896827.67 frames. ], batch size: 71, lr: 2.67e-02, grad_scale: 16384.0 2024-08-09 20:51:48,713 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.933e+01 3.481e+01 4.250e+01 8.633e+01, threshold=6.961e+01, percent-clipped=3.0 2024-08-09 20:51:51,356 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2024-08-09 20:51:58,354 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=12.0 2024-08-09 20:52:12,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=191080.0, ans=0.125 2024-08-09 20:52:21,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=191180.0, ans=0.125 2024-08-09 20:52:38,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=191280.0, ans=0.1 2024-08-09 20:52:42,949 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 20:52:44,308 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-09 20:52:50,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=191380.0, ans=0.125 2024-08-09 20:53:01,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=191380.0, ans=0.125 2024-08-09 20:53:05,213 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4650, loss[loss=0.1515, beats_loss=0.01299, ecapa_loss=0.0002759, whisper_loss=0.1357, over 16170.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01262, ecapa_loss=0.0003442, whisper_loss=0.1025, over 3859358.76 frames. ], batch size: 59, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:53:14,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=191480.0, ans=0.125 2024-08-09 20:53:15,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=191480.0, ans=0.125 2024-08-09 20:53:15,207 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-09 20:54:02,238 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2024-08-09 20:54:10,310 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.93 vs. limit=15.0 2024-08-09 20:54:20,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=191880.0, ans=0.0 2024-08-09 20:54:25,255 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4700, loss[loss=0.1112, beats_loss=0.0167, ecapa_loss=0.0002688, whisper_loss=0.09185, over 21999.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01266, ecapa_loss=0.0003444, whisper_loss=0.102, over 3836313.69 frames. ], batch size: 87, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:54:28,074 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.995e+01 3.606e+01 4.056e+01 7.854e+01, threshold=7.212e+01, percent-clipped=1.0 2024-08-09 20:54:31,435 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-09 20:54:34,740 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-09 20:54:38,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=191980.0, ans=0.1 2024-08-09 20:54:47,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=192080.0, ans=0.125 2024-08-09 20:54:57,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=19.45 vs. limit=15.0 2024-08-09 20:55:00,579 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 20:55:23,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192280.0, ans=0.1 2024-08-09 20:55:31,568 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 20:55:36,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=192380.0, ans=0.125 2024-08-09 20:55:39,018 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 20:55:41,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192380.0, ans=0.1 2024-08-09 20:55:45,799 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4750, loss[loss=0.1075, beats_loss=0.01174, ecapa_loss=0.0003808, whisper_loss=0.09197, over 19640.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.0127, ecapa_loss=0.0003465, whisper_loss=0.1017, over 3841023.82 frames. ], batch size: 81, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:56:21,489 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.29 vs. limit=10.0 2024-08-09 20:56:29,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=192680.0, ans=0.0 2024-08-09 20:56:32,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=192780.0, ans=0.125 2024-08-09 20:56:51,633 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-09 20:56:51,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=192880.0, ans=0.04949747468305833 2024-08-09 20:57:01,488 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 20:57:02,694 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 20:57:04,176 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4800, loss[loss=0.1038, beats_loss=0.01438, ecapa_loss=0.0003468, whisper_loss=0.086, over 15338.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.0128, ecapa_loss=0.0003463, whisper_loss=0.1013, over 3848082.96 frames. ], batch size: 64, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:57:07,356 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 3.258e+01 3.599e+01 4.060e+01 6.614e+01, threshold=7.198e+01, percent-clipped=0.0 2024-08-09 20:57:12,617 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.134e-02 2024-08-09 20:57:18,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=193080.0, ans=0.1 2024-08-09 20:57:23,751 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.34 vs. limit=15.0 2024-08-09 20:57:24,594 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 29 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-09 20:58:04,184 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2024-08-09 20:58:10,951 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-09 20:58:13,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=193380.0, ans=0.125 2024-08-09 20:58:17,907 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4850, loss[loss=0.1437, beats_loss=0.007403, ecapa_loss=0.0004072, whisper_loss=0.1323, over 14380.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01279, ecapa_loss=0.000346, whisper_loss=0.1015, over 3843645.20 frames. ], batch size: 54, lr: 2.65e-02, grad_scale: 16384.0 2024-08-09 20:58:21,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=193480.0, ans=0.0 2024-08-09 20:58:25,230 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 20:58:29,322 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-09 20:58:29,757 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-09 20:58:33,560 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 20:58:35,036 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-09 20:58:50,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193680.0, ans=0.1 2024-08-09 20:58:56,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=193680.0, ans=0.125 2024-08-09 20:58:58,681 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 19 from LS+wenet, 32 from Vox, 42 fro AS 2024-08-09 20:59:07,508 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=15.0 2024-08-09 20:59:08,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=193780.0, ans=0.2 2024-08-09 20:59:19,324 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 20:59:25,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=193880.0, ans=0.07 2024-08-09 20:59:27,457 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4900, loss[loss=0.115, beats_loss=0.01368, ecapa_loss=0.000347, whisper_loss=0.0979, over 19401.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01285, ecapa_loss=0.0003453, whisper_loss=0.101, over 3855541.01 frames. ], batch size: 79, lr: 2.65e-02, grad_scale: 16384.0 2024-08-09 20:59:30,442 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.990e+01 3.252e+01 3.746e+01 5.696e+01, threshold=6.504e+01, percent-clipped=0.0 2024-08-09 20:59:44,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=12.0 2024-08-09 20:59:56,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=194180.0, ans=0.07 2024-08-09 20:59:57,169 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 18 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 21:00:07,615 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-09 21:00:22,613 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 21:00:31,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=194380.0, ans=0.1 2024-08-09 21:00:36,257 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 4950, loss[loss=0.109, beats_loss=0.01481, ecapa_loss=0.0002607, whisper_loss=0.0916, over 21515.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01285, ecapa_loss=0.0003414, whisper_loss=0.1009, over 3863642.21 frames. ], batch size: 82, lr: 2.65e-02, grad_scale: 16384.0 2024-08-09 21:00:41,571 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-09 21:00:51,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=194580.0, ans=0.2 2024-08-09 21:01:09,926 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-09 21:01:11,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=194680.0, ans=0.0 2024-08-09 21:01:15,676 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 21:01:34,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=194880.0, ans=0.1 2024-08-09 21:01:40,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=194880.0, ans=0.0 2024-08-09 21:01:43,970 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5000, loss[loss=0.1207, beats_loss=0.01372, ecapa_loss=0.0002998, whisper_loss=0.104, over 22641.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01291, ecapa_loss=0.0003405, whisper_loss=0.1008, over 3878157.10 frames. ], batch size: 90, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:01:44,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=194980.0, ans=0.125 2024-08-09 21:01:46,814 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.882e+01 3.259e+01 3.861e+01 5.497e+01, threshold=6.518e+01, percent-clipped=0.0 2024-08-09 21:01:50,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=194980.0, ans=0.0 2024-08-09 21:01:52,639 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2024-08-09 21:01:53,970 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2024-08-09 21:02:03,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.95 vs. limit=5.0 2024-08-09 21:02:14,825 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 21:02:21,109 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-08-09 21:02:28,666 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 21:02:37,836 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-09 21:02:51,131 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5050, loss[loss=0.1321, beats_loss=0.01272, ecapa_loss=0.0003384, whisper_loss=0.116, over 22519.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01293, ecapa_loss=0.000342, whisper_loss=0.1013, over 3904827.04 frames. ], batch size: 89, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:03:10,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=195580.0, ans=0.125 2024-08-09 21:03:16,257 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-09 21:03:23,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=195680.0, ans=0.125 2024-08-09 21:03:52,111 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-09 21:03:52,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=195880.0, ans=0.04949747468305833 2024-08-09 21:03:55,429 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.86 vs. limit=15.0 2024-08-09 21:03:57,156 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5100, loss[loss=0.1359, beats_loss=0.009582, ecapa_loss=0.0004221, whisper_loss=0.1221, over 19166.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01295, ecapa_loss=0.000341, whisper_loss=0.101, over 3927012.74 frames. ], batch size: 78, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:03:58,827 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 21:03:59,938 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.875e+01 3.306e+01 3.993e+01 6.485e+01, threshold=6.613e+01, percent-clipped=0.0 2024-08-09 21:04:44,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=196280.0, ans=0.125 2024-08-09 21:04:47,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=196280.0, ans=0.125 2024-08-09 21:04:58,429 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=12.0 2024-08-09 21:05:04,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=22.5 2024-08-09 21:05:05,575 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5150, loss[loss=0.08187, beats_loss=0.0158, ecapa_loss=0.0003718, whisper_loss=0.06235, over 21191.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01293, ecapa_loss=0.0003391, whisper_loss=0.1016, over 3935934.10 frames. ], batch size: 91, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:05:07,013 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-09 21:05:10,168 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 37 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-09 21:05:18,426 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.09 vs. limit=10.0 2024-08-09 21:05:20,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=196580.0, ans=0.2 2024-08-09 21:05:28,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=196580.0, ans=0.04949747468305833 2024-08-09 21:05:41,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=196680.0, ans=0.035 2024-08-09 21:05:51,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=196780.0, ans=0.0 2024-08-09 21:06:10,760 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 21:06:13,489 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5200, loss[loss=0.1149, beats_loss=0.01404, ecapa_loss=0.0003317, whisper_loss=0.0975, over 15977.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01286, ecapa_loss=0.0003368, whisper_loss=0.1017, over 3904818.49 frames. ], batch size: 62, lr: 2.63e-02, grad_scale: 16384.0 2024-08-09 21:06:16,142 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.861e+01 3.315e+01 3.921e+01 5.764e+01, threshold=6.630e+01, percent-clipped=0.0 2024-08-09 21:06:18,968 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 37 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-09 21:06:36,518 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.49 vs. limit=15.0 2024-08-09 21:06:50,703 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-09 21:07:16,950 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2024-08-09 21:07:21,285 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5250, loss[loss=0.1188, beats_loss=0.01211, ecapa_loss=0.0003222, whisper_loss=0.1035, over 22630.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01282, ecapa_loss=0.0003379, whisper_loss=0.1018, over 3895230.92 frames. ], batch size: 92, lr: 2.63e-02, grad_scale: 16384.0 2024-08-09 21:07:21,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197480.0, ans=0.1 2024-08-09 21:07:23,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.29 vs. limit=15.0 2024-08-09 21:07:32,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=197480.0, ans=10.0 2024-08-09 21:07:44,643 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 21:07:45,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=197580.0, ans=0.0 2024-08-09 21:08:20,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=197880.0, ans=0.125 2024-08-09 21:08:27,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=197880.0, ans=0.015 2024-08-09 21:08:27,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=197880.0, ans=0.0 2024-08-09 21:08:28,150 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-09 21:08:30,349 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5300, loss[loss=0.1164, beats_loss=0.01595, ecapa_loss=0.0002583, whisper_loss=0.09788, over 18172.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.0128, ecapa_loss=0.0003376, whisper_loss=0.1016, over 3886013.44 frames. ], batch size: 72, lr: 2.63e-02, grad_scale: 16384.0 2024-08-09 21:08:30,791 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.667e-01 2024-08-09 21:08:33,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.918e+01 3.459e+01 4.148e+01 6.900e+01, threshold=6.919e+01, percent-clipped=2.0 2024-08-09 21:08:55,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=198080.0, ans=0.125 2024-08-09 21:09:09,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=198180.0, ans=0.125 2024-08-09 21:09:17,880 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-09 21:09:21,016 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-09 21:09:39,360 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-09 21:09:40,427 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5350, loss[loss=0.1174, beats_loss=0.01056, ecapa_loss=0.0003797, whisper_loss=0.1031, over 16293.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01272, ecapa_loss=0.0003379, whisper_loss=0.1011, over 3868754.80 frames. ], batch size: 64, lr: 2.62e-02, grad_scale: 16384.0 2024-08-09 21:09:43,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=198480.0, ans=10.0 2024-08-09 21:09:45,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=198480.0, ans=0.125 2024-08-09 21:10:03,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=198580.0, ans=0.125 2024-08-09 21:10:03,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=198580.0, ans=0.5 2024-08-09 21:10:06,853 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 13 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 21:10:09,557 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.36 vs. limit=6.0 2024-08-09 21:10:11,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=198680.0, ans=0.125 2024-08-09 21:10:18,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198680.0, ans=0.1 2024-08-09 21:10:18,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=198680.0, ans=0.125 2024-08-09 21:10:21,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=198680.0, ans=0.0 2024-08-09 21:10:37,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.23 vs. limit=10.0 2024-08-09 21:10:46,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198880.0, ans=0.1 2024-08-09 21:10:46,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=198880.0, ans=0.125 2024-08-09 21:10:52,702 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5400, loss[loss=0.1199, beats_loss=0.01264, ecapa_loss=0.00036, whisper_loss=0.1036, over 22847.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01274, ecapa_loss=0.0003369, whisper_loss=0.1007, over 3847914.79 frames. ], batch size: 89, lr: 2.62e-02, grad_scale: 16384.0 2024-08-09 21:10:55,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.905e+01 3.438e+01 3.898e+01 7.093e+01, threshold=6.876e+01, percent-clipped=1.0 2024-08-09 21:10:57,109 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 21:11:02,761 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 29 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-09 21:11:17,045 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 27 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-09 21:11:35,228 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 21:11:55,183 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 21:12:00,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=199380.0, ans=0.0 2024-08-09 21:12:02,647 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 11 from Vox, 52 fro AS 2024-08-09 21:12:06,770 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5450, loss[loss=0.1133, beats_loss=0.01384, ecapa_loss=0.0003315, whisper_loss=0.09611, over 22922.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01276, ecapa_loss=0.0003377, whisper_loss=0.1008, over 3839381.36 frames. ], batch size: 90, lr: 2.62e-02, grad_scale: 16384.0 2024-08-09 21:12:08,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=199480.0, ans=0.0 2024-08-09 21:12:17,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=199480.0, ans=0.0 2024-08-09 21:12:21,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=199580.0, ans=0.125 2024-08-09 21:12:25,104 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 21:12:29,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=199580.0, ans=0.125 2024-08-09 21:12:46,661 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 21:12:49,305 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-09 21:12:58,017 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 21:13:02,905 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 16 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 21:13:18,004 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5500, loss[loss=0.1501, beats_loss=0.01023, ecapa_loss=0.0003576, whisper_loss=0.1363, over 22949.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01269, ecapa_loss=0.0003392, whisper_loss=0.1015, over 3846992.48 frames. ], batch size: 88, lr: 2.61e-02, grad_scale: 16384.0 2024-08-09 21:13:23,478 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 3.012e+01 3.355e+01 3.811e+01 5.286e+01, threshold=6.711e+01, percent-clipped=0.0 2024-08-09 21:13:31,050 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 31 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-09 21:13:42,360 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 21:13:55,826 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.600e+00 2024-08-09 21:14:04,700 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2024-08-09 21:14:21,623 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 21:14:31,914 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 21:14:33,105 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5550, loss[loss=0.129, beats_loss=0.01321, ecapa_loss=0.0002908, whisper_loss=0.1129, over 21727.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01287, ecapa_loss=0.0003377, whisper_loss=0.101, over 3889658.68 frames. ], batch size: 85, lr: 2.61e-02, grad_scale: 32768.0 2024-08-09 21:14:33,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=200480.0, ans=0.125 2024-08-09 21:14:41,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=200480.0, ans=0.125 2024-08-09 21:14:49,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=200580.0, ans=0.125 2024-08-09 21:15:14,683 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2024-08-09 21:15:16,896 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 21:15:21,043 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 21:15:31,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=200880.0, ans=0.125 2024-08-09 21:15:45,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=200980.0, ans=0.2 2024-08-09 21:15:46,414 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5600, loss[loss=0.09822, beats_loss=0.0152, ecapa_loss=0.0002552, whisper_loss=0.08046, over 21066.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01283, ecapa_loss=0.000336, whisper_loss=0.1012, over 3876184.89 frames. ], batch size: 85, lr: 2.61e-02, grad_scale: 32768.0 2024-08-09 21:15:48,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=200980.0, ans=0.125 2024-08-09 21:15:49,806 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 3.019e+01 3.603e+01 4.139e+01 2.249e+02, threshold=7.206e+01, percent-clipped=7.0 2024-08-09 21:15:50,576 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.62 vs. limit=12.0 2024-08-09 21:16:08,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=201080.0, ans=0.0 2024-08-09 21:16:14,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=201180.0, ans=0.1 2024-08-09 21:16:21,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=201180.0, ans=0.125 2024-08-09 21:16:27,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201180.0, ans=0.1 2024-08-09 21:16:27,433 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.09 vs. limit=22.5 2024-08-09 21:16:33,645 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 21:16:38,795 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 21:16:56,078 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5650, loss[loss=0.1202, beats_loss=0.01272, ecapa_loss=0.0004115, whisper_loss=0.1034, over 20795.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01291, ecapa_loss=0.0003325, whisper_loss=0.1008, over 3891150.43 frames. ], batch size: 86, lr: 2.61e-02, grad_scale: 32768.0 2024-08-09 21:17:02,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=201480.0, ans=0.125 2024-08-09 21:17:07,213 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=15.0 2024-08-09 21:17:13,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=201580.0, ans=0.2 2024-08-09 21:17:16,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=201580.0, ans=0.0 2024-08-09 21:17:19,989 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-09 21:17:26,805 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 21:17:41,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=201780.0, ans=0.0 2024-08-09 21:17:41,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=201780.0, ans=0.0 2024-08-09 21:17:44,045 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 18 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-09 21:17:46,960 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-09 21:17:57,192 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 33 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-09 21:18:01,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-08-09 21:18:01,638 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.99 vs. limit=6.0 2024-08-09 21:18:03,385 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5700, loss[loss=0.1452, beats_loss=0.01051, ecapa_loss=0.0003303, whisper_loss=0.1314, over 20487.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01293, ecapa_loss=0.0003311, whisper_loss=0.1008, over 3922776.02 frames. ], batch size: 78, lr: 2.60e-02, grad_scale: 32768.0 2024-08-09 21:18:06,769 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 3.095e+01 3.448e+01 4.225e+01 7.062e+01, threshold=6.897e+01, percent-clipped=0.0 2024-08-09 21:18:20,022 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-09 21:18:38,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=202180.0, ans=0.0 2024-08-09 21:18:50,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=202280.0, ans=0.125 2024-08-09 21:19:01,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=202380.0, ans=0.0 2024-08-09 21:19:10,517 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5750, loss[loss=0.1324, beats_loss=0.008464, ecapa_loss=0.000397, whisper_loss=0.1199, over 15924.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01292, ecapa_loss=0.0003333, whisper_loss=0.1004, over 3904281.11 frames. ], batch size: 60, lr: 2.60e-02, grad_scale: 32768.0 2024-08-09 21:19:11,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=202480.0, ans=0.125 2024-08-09 21:19:16,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=202480.0, ans=0.0 2024-08-09 21:19:20,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=202480.0, ans=0.0 2024-08-09 21:19:34,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=202580.0, ans=0.05 2024-08-09 21:19:35,438 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-09 21:19:39,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=202680.0, ans=0.5 2024-08-09 21:19:42,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=202680.0, ans=0.2 2024-08-09 21:19:56,881 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 21:19:58,758 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2024-08-09 21:20:03,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=202880.0, ans=0.025 2024-08-09 21:20:07,183 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-09 21:20:07,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=202880.0, ans=0.2 2024-08-09 21:20:17,941 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5800, loss[loss=0.1244, beats_loss=0.01338, ecapa_loss=0.000331, whisper_loss=0.1077, over 20143.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01288, ecapa_loss=0.0003369, whisper_loss=0.1006, over 3892246.58 frames. ], batch size: 80, lr: 2.60e-02, grad_scale: 32768.0 2024-08-09 21:20:20,443 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 3.100e+01 3.407e+01 4.370e+01 6.410e+01, threshold=6.814e+01, percent-clipped=0.0 2024-08-09 21:20:25,337 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2024-08-09 21:20:39,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=203080.0, ans=0.05 2024-08-09 21:20:41,447 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.96 vs. limit=6.0 2024-08-09 21:20:42,808 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.83 vs. limit=15.0 2024-08-09 21:20:50,202 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 26 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-09 21:21:01,842 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 21:21:08,857 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 37 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 21:21:10,703 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2024-08-09 21:21:12,760 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 21:21:14,075 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 21:21:24,878 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5850, loss[loss=0.1012, beats_loss=0.01223, ecapa_loss=0.0003361, whisper_loss=0.08558, over 18781.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01289, ecapa_loss=0.0003367, whisper_loss=0.101, over 3902893.39 frames. ], batch size: 75, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:21:27,586 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-09 21:21:29,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=203480.0, ans=0.125 2024-08-09 21:21:43,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=203580.0, ans=0.125 2024-08-09 21:21:46,179 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 18 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-09 21:21:48,102 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.06 vs. limit=12.0 2024-08-09 21:21:59,844 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=15.0 2024-08-09 21:22:03,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=203780.0, ans=0.125 2024-08-09 21:22:14,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=203780.0, ans=0.1 2024-08-09 21:22:16,368 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.68 vs. limit=10.0 2024-08-09 21:22:23,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=203880.0, ans=0.07 2024-08-09 21:22:27,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=203880.0, ans=0.0 2024-08-09 21:22:30,309 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 21:22:31,541 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5900, loss[loss=0.1202, beats_loss=0.01125, ecapa_loss=0.0003516, whisper_loss=0.1055, over 19872.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.0129, ecapa_loss=0.0003384, whisper_loss=0.1001, over 3878058.68 frames. ], batch size: 80, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:22:34,084 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 3.068e+01 3.370e+01 4.019e+01 7.434e+01, threshold=6.739e+01, percent-clipped=1.0 2024-08-09 21:22:52,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=204080.0, ans=0.2 2024-08-09 21:23:10,817 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-09 21:23:13,726 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-09 21:23:21,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=204280.0, ans=0.2 2024-08-09 21:23:35,436 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 21:23:38,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=204480.0, ans=0.125 2024-08-09 21:23:39,255 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 5950, loss[loss=0.1026, beats_loss=0.01415, ecapa_loss=0.0003149, whisper_loss=0.0853, over 22455.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01296, ecapa_loss=0.0003345, whisper_loss=0.09971, over 3901043.62 frames. ], batch size: 92, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:23:45,553 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-09 21:24:07,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=204680.0, ans=0.125 2024-08-09 21:24:26,171 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-09 21:24:44,490 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6000, loss[loss=0.0848, beats_loss=0.01552, ecapa_loss=0.0002299, whisper_loss=0.06698, over 16148.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01298, ecapa_loss=0.0003314, whisper_loss=0.1001, over 3892252.62 frames. ], batch size: 64, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:24:44,491 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-09 21:25:26,056 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on ASR_libri: loss=0.2831, beats_loss=0, ecapa_loss=0.0009654, whisper_loss=0.2734, over 922467.00 frames. 2024-08-09 21:25:44,570 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on SV_voxceleb1: loss=0.008561, beats_loss=0, ecapa_loss=0.0008561, whisper_loss=0, over 939242.00 frames. 2024-08-09 21:27:41,190 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on AT_audioset: loss=0.03036, beats_loss=0.03036, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 21:27:41,194 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-09 21:27:42,829 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-09 21:27:43,843 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.831e+01 3.333e+01 3.565e+01 5.881e+01, threshold=6.666e+01, percent-clipped=0.0 2024-08-09 21:27:44,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=204980.0, ans=0.125 2024-08-09 21:27:49,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=204980.0, ans=0.0 2024-08-09 21:28:06,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=205080.0, ans=0.2 2024-08-09 21:28:07,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-09 21:28:14,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=205180.0, ans=0.125 2024-08-09 21:28:26,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=205280.0, ans=0.0 2024-08-09 21:28:48,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6050, loss[loss=0.1095, beats_loss=0.01444, ecapa_loss=0.0003004, whisper_loss=0.09206, over 21029.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01288, ecapa_loss=0.000333, whisper_loss=0.1003, over 3873162.22 frames. ], batch size: 82, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:29:15,638 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2024-08-09 21:29:18,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=205680.0, ans=22.5 2024-08-09 21:29:24,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=205680.0, ans=0.0 2024-08-09 21:29:28,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=205780.0, ans=0.0 2024-08-09 21:29:35,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=205780.0, ans=0.1 2024-08-09 21:29:38,139 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 19 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-09 21:29:51,159 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-09 21:29:54,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=205980.0, ans=0.125 2024-08-09 21:29:54,943 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6100, loss[loss=0.08851, beats_loss=0.0134, ecapa_loss=0.000358, whisper_loss=0.07153, over 16525.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.0129, ecapa_loss=0.0003358, whisper_loss=0.1003, over 3876662.04 frames. ], batch size: 69, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:29:57,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=205980.0, ans=0.125 2024-08-09 21:29:57,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 3.058e+01 3.470e+01 4.090e+01 8.250e+01, threshold=6.939e+01, percent-clipped=1.0 2024-08-09 21:29:58,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=205980.0, ans=0.125 2024-08-09 21:30:06,102 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 21:30:08,572 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-09 21:30:12,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=206080.0, ans=0.2 2024-08-09 21:30:13,990 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-09 21:30:19,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=206080.0, ans=0.125 2024-08-09 21:30:40,828 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 21:30:43,584 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-09 21:31:03,219 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6150, loss[loss=0.1106, beats_loss=0.01208, ecapa_loss=0.0003538, whisper_loss=0.09499, over 22327.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01281, ecapa_loss=0.0003356, whisper_loss=0.1005, over 3907218.02 frames. ], batch size: 93, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:31:23,775 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.40 vs. limit=15.0 2024-08-09 21:31:30,677 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.94 vs. limit=22.5 2024-08-09 21:31:31,760 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-08-09 21:31:37,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=206680.0, ans=0.2 2024-08-09 21:31:38,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=206680.0, ans=0.0 2024-08-09 21:32:01,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=206880.0, ans=0.125 2024-08-09 21:32:10,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6200, loss[loss=0.1011, beats_loss=0.01481, ecapa_loss=0.0003191, whisper_loss=0.08314, over 14621.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01278, ecapa_loss=0.0003348, whisper_loss=0.1004, over 3900209.44 frames. ], batch size: 58, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:32:13,172 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 3.042e+01 3.611e+01 4.258e+01 6.640e+01, threshold=7.222e+01, percent-clipped=0.0 2024-08-09 21:32:20,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=206980.0, ans=0.125 2024-08-09 21:32:30,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=207080.0, ans=0.2 2024-08-09 21:32:34,218 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.91 vs. limit=22.5 2024-08-09 21:32:35,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=207080.0, ans=0.2 2024-08-09 21:32:51,633 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 21:32:54,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=207280.0, ans=0.2 2024-08-09 21:33:18,289 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6250, loss[loss=0.1025, beats_loss=0.01558, ecapa_loss=0.0002268, whisper_loss=0.08463, over 18607.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01277, ecapa_loss=0.0003349, whisper_loss=0.1002, over 3880174.63 frames. ], batch size: 71, lr: 2.57e-02, grad_scale: 32768.0 2024-08-09 21:33:35,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=207580.0, ans=15.0 2024-08-09 21:33:37,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=207580.0, ans=0.0 2024-08-09 21:33:38,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=207580.0, ans=0.125 2024-08-09 21:33:53,989 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-09 21:34:06,018 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 21:34:06,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=207780.0, ans=0.125 2024-08-09 21:34:13,655 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2024-08-09 21:34:22,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=207880.0, ans=0.0 2024-08-09 21:34:27,747 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6300, loss[loss=0.1198, beats_loss=0.01015, ecapa_loss=0.0003813, whisper_loss=0.1058, over 14467.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01268, ecapa_loss=0.0003348, whisper_loss=0.1011, over 3897038.08 frames. ], batch size: 60, lr: 2.57e-02, grad_scale: 32768.0 2024-08-09 21:34:30,453 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.893e+01 3.305e+01 3.810e+01 5.470e+01, threshold=6.610e+01, percent-clipped=0.0 2024-08-09 21:34:36,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=207980.0, ans=0.125 2024-08-09 21:35:05,972 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2024-08-09 21:35:16,422 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 21:35:35,764 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6350, loss[loss=0.1141, beats_loss=0.01396, ecapa_loss=0.0003749, whisper_loss=0.09641, over 22413.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01269, ecapa_loss=0.0003362, whisper_loss=0.1016, over 3886356.05 frames. ], batch size: 90, lr: 2.57e-02, grad_scale: 32768.0 2024-08-09 21:35:35,905 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-09 21:35:39,639 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 21:35:42,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=208480.0, ans=0.2 2024-08-09 21:35:57,305 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.95 vs. limit=22.5 2024-08-09 21:35:59,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=208580.0, ans=0.125 2024-08-09 21:36:27,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=208780.0, ans=0.125 2024-08-09 21:36:35,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=208880.0, ans=0.2 2024-08-09 21:36:44,972 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6400, loss[loss=0.1182, beats_loss=0.01623, ecapa_loss=0.0002761, whisper_loss=0.09925, over 23250.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01259, ecapa_loss=0.0003359, whisper_loss=0.1026, over 3897641.76 frames. ], batch size: 90, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:36:48,109 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 3.030e+01 3.423e+01 4.041e+01 6.749e+01, threshold=6.846e+01, percent-clipped=1.0 2024-08-09 21:36:51,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=208980.0, ans=0.125 2024-08-09 21:36:57,205 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.45 vs. limit=22.5 2024-08-09 21:37:06,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=209080.0, ans=0.125 2024-08-09 21:37:19,871 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-09 21:37:32,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=209280.0, ans=0.125 2024-08-09 21:37:44,038 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.59 vs. limit=15.0 2024-08-09 21:37:54,844 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6450, loss[loss=0.1242, beats_loss=0.01084, ecapa_loss=0.0004197, whisper_loss=0.1092, over 21127.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01263, ecapa_loss=0.0003367, whisper_loss=0.1019, over 3902186.06 frames. ], batch size: 87, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:38:10,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=209580.0, ans=0.125 2024-08-09 21:38:25,741 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-09 21:38:28,281 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 21:38:36,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=209780.0, ans=0.0 2024-08-09 21:38:46,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=209780.0, ans=0.125 2024-08-09 21:38:53,327 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-09 21:39:04,634 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6500, loss[loss=0.1161, beats_loss=0.01322, ecapa_loss=0.000339, whisper_loss=0.09951, over 15369.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01268, ecapa_loss=0.0003337, whisper_loss=0.1018, over 3883014.17 frames. ], batch size: 62, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:39:05,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209980.0, ans=0.1 2024-08-09 21:39:07,002 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2024-08-09 21:39:07,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.878e+01 3.238e+01 3.656e+01 8.439e+01, threshold=6.476e+01, percent-clipped=1.0 2024-08-09 21:39:16,034 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 21:39:33,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=210180.0, ans=0.2 2024-08-09 21:39:42,490 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-08-09 21:39:46,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=210280.0, ans=0.0 2024-08-09 21:39:55,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=210280.0, ans=0.0 2024-08-09 21:39:59,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=210380.0, ans=0.125 2024-08-09 21:40:01,997 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-09 21:40:02,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=210380.0, ans=0.2 2024-08-09 21:40:08,929 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 8 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 21:40:09,493 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.42 vs. limit=10.0 2024-08-09 21:40:10,211 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 21:40:10,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=210380.0, ans=0.125 2024-08-09 21:40:14,257 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6550, loss[loss=0.09942, beats_loss=0.01242, ecapa_loss=0.0002833, whisper_loss=0.08417, over 16543.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01273, ecapa_loss=0.0003304, whisper_loss=0.1018, over 3885233.87 frames. ], batch size: 65, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:40:20,319 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.914e+00 2024-08-09 21:40:22,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2024-08-09 21:40:29,390 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 21:40:39,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=210580.0, ans=0.05 2024-08-09 21:40:46,041 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 21:40:54,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=210780.0, ans=0.2 2024-08-09 21:40:58,580 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-09 21:41:00,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=210780.0, ans=0.125 2024-08-09 21:41:04,131 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.563e-03 2024-08-09 21:41:11,169 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.40 vs. limit=15.0 2024-08-09 21:41:19,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=210880.0, ans=0.0 2024-08-09 21:41:22,215 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6600, loss[loss=0.1074, beats_loss=0.01247, ecapa_loss=0.0004047, whisper_loss=0.09084, over 16150.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01276, ecapa_loss=0.0003297, whisper_loss=0.1016, over 3891784.24 frames. ], batch size: 67, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:41:24,840 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 3.037e+01 3.483e+01 4.077e+01 6.253e+01, threshold=6.966e+01, percent-clipped=0.0 2024-08-09 21:41:26,584 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 21:41:28,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=210980.0, ans=0.0 2024-08-09 21:41:37,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=211080.0, ans=0.1 2024-08-09 21:41:38,740 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-09 21:42:05,249 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.208e-01 2024-08-09 21:42:17,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=211380.0, ans=0.0 2024-08-09 21:42:29,171 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 22 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-09 21:42:31,817 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6650, loss[loss=0.1416, beats_loss=0.01057, ecapa_loss=0.0003813, whisper_loss=0.1272, over 20881.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01272, ecapa_loss=0.0003307, whisper_loss=0.1019, over 3876066.67 frames. ], batch size: 81, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:42:36,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=211480.0, ans=0.125 2024-08-09 21:42:46,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=211580.0, ans=0.0 2024-08-09 21:42:51,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=211580.0, ans=0.1 2024-08-09 21:42:53,073 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 18 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-09 21:42:53,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=211580.0, ans=0.125 2024-08-09 21:43:17,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=211780.0, ans=0.1 2024-08-09 21:43:38,258 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6700, loss[loss=0.1035, beats_loss=0.01452, ecapa_loss=0.000291, whisper_loss=0.08603, over 22944.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01268, ecapa_loss=0.0003316, whisper_loss=0.1021, over 3918781.32 frames. ], batch size: 89, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:43:41,043 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 3.049e+01 3.429e+01 4.303e+01 7.619e+01, threshold=6.858e+01, percent-clipped=1.0 2024-08-09 21:44:07,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=212180.0, ans=0.2 2024-08-09 21:44:07,994 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=12.0 2024-08-09 21:44:11,739 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-09 21:44:23,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=212280.0, ans=0.0 2024-08-09 21:44:47,828 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6750, loss[loss=0.09565, beats_loss=0.01484, ecapa_loss=0.0002902, whisper_loss=0.07791, over 23566.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.0127, ecapa_loss=0.000333, whisper_loss=0.1013, over 3878077.21 frames. ], batch size: 93, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:45:12,766 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-09 21:45:26,539 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-09 21:45:29,049 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 21:45:30,368 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-09 21:45:33,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=212780.0, ans=0.125 2024-08-09 21:45:37,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=212780.0, ans=0.0 2024-08-09 21:45:43,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=212880.0, ans=0.0 2024-08-09 21:45:56,207 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6800, loss[loss=0.1294, beats_loss=0.009, ecapa_loss=0.000386, whisper_loss=0.1166, over 18755.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.0127, ecapa_loss=0.0003328, whisper_loss=0.1007, over 3859046.47 frames. ], batch size: 73, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:45:56,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=212980.0, ans=0.125 2024-08-09 21:45:58,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.928e+01 3.409e+01 4.100e+01 8.566e+01, threshold=6.819e+01, percent-clipped=2.0 2024-08-09 21:46:03,382 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-09 21:46:03,926 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-08-09 21:46:04,599 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 21:46:08,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=213080.0, ans=0.125 2024-08-09 21:46:10,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=213080.0, ans=0.2 2024-08-09 21:46:23,154 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 21:46:38,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=213280.0, ans=0.09899494936611666 2024-08-09 21:46:47,019 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2024-08-09 21:46:51,674 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-09 21:46:54,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=213380.0, ans=0.125 2024-08-09 21:47:03,694 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6850, loss[loss=0.1217, beats_loss=0.01183, ecapa_loss=0.0003396, whisper_loss=0.1065, over 23812.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01275, ecapa_loss=0.0003327, whisper_loss=0.1007, over 3850215.47 frames. ], batch size: 94, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:47:14,688 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 21:47:21,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=213580.0, ans=0.2 2024-08-09 21:47:28,169 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 15 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-09 21:47:32,331 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 21:47:39,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=213680.0, ans=0.0 2024-08-09 21:47:41,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=213680.0, ans=0.0 2024-08-09 21:47:44,882 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.71 vs. limit=22.5 2024-08-09 21:47:47,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=213780.0, ans=0.125 2024-08-09 21:47:51,847 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.72 vs. limit=22.5 2024-08-09 21:48:10,958 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6900, loss[loss=0.1058, beats_loss=0.0121, ecapa_loss=0.000311, whisper_loss=0.09059, over 21053.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01285, ecapa_loss=0.000331, whisper_loss=0.1005, over 3880958.58 frames. ], batch size: 83, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:48:12,638 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 30 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-09 21:48:13,989 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 3.002e+01 3.455e+01 4.166e+01 7.035e+01, threshold=6.909e+01, percent-clipped=1.0 2024-08-09 21:48:21,074 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-09 21:48:25,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=214080.0, ans=0.04949747468305833 2024-08-09 21:48:29,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=214080.0, ans=0.2 2024-08-09 21:48:31,872 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 21:48:32,428 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2024-08-09 21:48:37,040 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 21:48:53,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214280.0, ans=0.1 2024-08-09 21:48:57,069 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 21 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-09 21:49:04,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.84 vs. limit=15.0 2024-08-09 21:49:07,699 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-09 21:49:10,206 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-09 21:49:15,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=214380.0, ans=10.0 2024-08-09 21:49:17,852 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 6950, loss[loss=0.09763, beats_loss=0.0142, ecapa_loss=0.0003458, whisper_loss=0.07997, over 21118.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01289, ecapa_loss=0.0003313, whisper_loss=0.1003, over 3888732.26 frames. ], batch size: 88, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:49:31,697 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 21:49:34,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=214580.0, ans=0.125 2024-08-09 21:49:34,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=214580.0, ans=0.125 2024-08-09 21:50:11,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2024-08-09 21:50:16,238 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2024-08-09 21:50:20,235 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.37 vs. limit=15.0 2024-08-09 21:50:24,370 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7000, loss[loss=0.1101, beats_loss=0.01352, ecapa_loss=0.0003288, whisper_loss=0.09326, over 22842.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01277, ecapa_loss=0.000331, whisper_loss=0.101, over 3891949.29 frames. ], batch size: 91, lr: 2.53e-02, grad_scale: 32768.0 2024-08-09 21:50:27,174 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.842e+01 3.336e+01 4.058e+01 9.243e+01, threshold=6.672e+01, percent-clipped=2.0 2024-08-09 21:50:27,926 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.61 vs. limit=22.5 2024-08-09 21:50:35,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=214980.0, ans=0.125 2024-08-09 21:50:35,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=214980.0, ans=0.0 2024-08-09 21:50:40,653 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 21:50:47,296 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.20 vs. limit=22.5 2024-08-09 21:50:50,953 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 21:50:51,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215180.0, ans=0.1 2024-08-09 21:50:53,552 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 21:50:58,345 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2024-08-09 21:51:30,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=215380.0, ans=0.125 2024-08-09 21:51:31,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=16.22 vs. limit=15.0 2024-08-09 21:51:33,441 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7050, loss[loss=0.1026, beats_loss=0.009613, ecapa_loss=0.0003513, whisper_loss=0.08947, over 14733.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01264, ecapa_loss=0.0003318, whisper_loss=0.1017, over 3884219.55 frames. ], batch size: 56, lr: 2.53e-02, grad_scale: 32768.0 2024-08-09 21:51:36,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=215480.0, ans=0.0 2024-08-09 21:51:43,120 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-09 21:51:51,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=215580.0, ans=0.125 2024-08-09 21:51:51,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=215580.0, ans=0.125 2024-08-09 21:51:54,495 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 21:51:59,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=215680.0, ans=0.125 2024-08-09 21:51:59,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=215680.0, ans=0.0 2024-08-09 21:52:06,780 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.05 vs. limit=6.0 2024-08-09 21:52:18,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=215780.0, ans=0.125 2024-08-09 21:52:41,314 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7100, loss[loss=0.1012, beats_loss=0.01402, ecapa_loss=0.0003644, whisper_loss=0.08355, over 20240.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01272, ecapa_loss=0.00033, whisper_loss=0.1015, over 3904290.06 frames. ], batch size: 85, lr: 2.53e-02, grad_scale: 32768.0 2024-08-09 21:52:43,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.849e+01 3.267e+01 3.796e+01 6.737e+01, threshold=6.534e+01, percent-clipped=1.0 2024-08-09 21:52:47,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=215980.0, ans=0.125 2024-08-09 21:52:50,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=215980.0, ans=0.0 2024-08-09 21:52:59,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=216080.0, ans=0.0 2024-08-09 21:53:03,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=216080.0, ans=0.2 2024-08-09 21:53:03,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=216080.0, ans=0.125 2024-08-09 21:53:27,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=216280.0, ans=0.0 2024-08-09 21:53:30,009 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 8 from Vox, 30 fro AS 2024-08-09 21:53:35,384 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 21:53:40,698 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 31 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-09 21:53:48,265 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7150, loss[loss=0.1135, beats_loss=0.009157, ecapa_loss=0.000347, whisper_loss=0.1008, over 14956.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01279, ecapa_loss=0.0003277, whisper_loss=0.1009, over 3917518.24 frames. ], batch size: 59, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:54:03,441 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 21:54:05,548 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-08-09 21:54:06,108 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-09 21:54:17,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=216680.0, ans=0.125 2024-08-09 21:54:17,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=216680.0, ans=0.125 2024-08-09 21:54:19,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=216680.0, ans=0.0 2024-08-09 21:54:20,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=216680.0, ans=0.125 2024-08-09 21:54:51,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=216880.0, ans=0.07 2024-08-09 21:54:54,665 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7200, loss[loss=0.1184, beats_loss=0.01483, ecapa_loss=0.0002586, whisper_loss=0.101, over 18765.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01281, ecapa_loss=0.0003273, whisper_loss=0.1004, over 3930477.00 frames. ], batch size: 73, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:54:57,419 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+01 3.192e+01 3.694e+01 4.293e+01 6.634e+01, threshold=7.388e+01, percent-clipped=1.0 2024-08-09 21:55:21,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=217180.0, ans=0.2 2024-08-09 21:55:25,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=217180.0, ans=0.125 2024-08-09 21:55:39,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=217280.0, ans=0.125 2024-08-09 21:55:46,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=217380.0, ans=0.1 2024-08-09 21:55:47,894 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 21:55:51,168 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 21:55:53,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=217380.0, ans=0.125 2024-08-09 21:56:00,734 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7250, loss[loss=0.1008, beats_loss=0.01373, ecapa_loss=0.0003563, whisper_loss=0.08351, over 21346.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01282, ecapa_loss=0.000327, whisper_loss=0.1003, over 3961111.56 frames. ], batch size: 90, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:56:10,249 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-09 21:56:12,347 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=22.5 2024-08-09 21:56:24,882 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 21:56:37,217 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-09 21:56:43,666 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-09 21:56:49,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=217780.0, ans=0.125 2024-08-09 21:57:01,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=217880.0, ans=10.0 2024-08-09 21:57:06,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=217980.0, ans=0.125 2024-08-09 21:57:07,501 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7300, loss[loss=0.1004, beats_loss=0.01473, ecapa_loss=0.0003334, whisper_loss=0.0823, over 14570.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01284, ecapa_loss=0.000326, whisper_loss=0.1002, over 3948710.61 frames. ], batch size: 59, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:57:10,463 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 3.021e+01 3.524e+01 4.153e+01 7.749e+01, threshold=7.049e+01, percent-clipped=1.0 2024-08-09 21:57:11,180 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2024-08-09 21:57:29,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=218080.0, ans=0.0 2024-08-09 21:57:29,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=218080.0, ans=0.0 2024-08-09 21:57:33,668 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 21:57:40,470 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-09 21:57:59,587 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 31 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 21:58:01,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=218380.0, ans=0.1 2024-08-09 21:58:06,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=218380.0, ans=0.0 2024-08-09 21:58:15,143 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7350, loss[loss=0.1043, beats_loss=0.01355, ecapa_loss=0.0003124, whisper_loss=0.08764, over 22345.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01284, ecapa_loss=0.0003254, whisper_loss=0.09995, over 3909043.98 frames. ], batch size: 91, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 21:58:20,011 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=15.0 2024-08-09 21:58:23,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=218480.0, ans=0.1 2024-08-09 21:58:34,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=218580.0, ans=0.125 2024-08-09 21:58:45,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=218680.0, ans=0.2 2024-08-09 21:58:45,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=218680.0, ans=0.04949747468305833 2024-08-09 21:58:50,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.66 vs. limit=6.0 2024-08-09 21:59:15,662 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 21:59:22,022 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7400, loss[loss=0.1088, beats_loss=0.01139, ecapa_loss=0.0003745, whisper_loss=0.09369, over 22434.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01286, ecapa_loss=0.0003264, whisper_loss=0.09997, over 3899291.49 frames. ], batch size: 92, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 21:59:22,238 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 21:59:24,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.912e+01 3.245e+01 3.982e+01 7.444e+01, threshold=6.489e+01, percent-clipped=1.0 2024-08-09 21:59:26,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=218980.0, ans=0.125 2024-08-09 21:59:29,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=218980.0, ans=0.125 2024-08-09 21:59:44,194 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-09 21:59:48,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=219180.0, ans=6.0 2024-08-09 21:59:48,172 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.93 vs. limit=10.0 2024-08-09 21:59:52,534 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-09 22:00:06,765 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-09 22:00:08,390 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.96 vs. limit=15.0 2024-08-09 22:00:27,316 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7450, loss[loss=0.1199, beats_loss=0.01099, ecapa_loss=0.0003244, whisper_loss=0.1057, over 20067.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01283, ecapa_loss=0.0003249, whisper_loss=0.1006, over 3896143.07 frames. ], batch size: 79, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 22:00:50,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=219580.0, ans=0.125 2024-08-09 22:00:52,427 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 22:01:05,615 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2024-08-09 22:01:19,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=219880.0, ans=0.1 2024-08-09 22:01:27,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=219880.0, ans=0.125 2024-08-09 22:01:32,127 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7500, loss[loss=0.1045, beats_loss=0.01252, ecapa_loss=0.0003645, whisper_loss=0.08835, over 19520.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01287, ecapa_loss=0.0003267, whisper_loss=0.1003, over 3888339.38 frames. ], batch size: 79, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 22:01:34,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.511e+01 3.195e+01 3.556e+01 4.126e+01 6.406e+01, threshold=7.112e+01, percent-clipped=0.0 2024-08-09 22:01:40,245 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 22:01:40,984 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2024-08-09 22:01:41,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=219980.0, ans=0.125 2024-08-09 22:01:44,183 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-09 22:01:53,497 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-09 22:01:53,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=220080.0, ans=0.1 2024-08-09 22:02:02,791 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 22:02:17,260 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 21 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 22:02:23,988 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-09 22:02:27,738 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-09 22:02:30,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=220380.0, ans=0.0 2024-08-09 22:02:38,677 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7550, loss[loss=0.1072, beats_loss=0.01063, ecapa_loss=0.0003324, whisper_loss=0.09322, over 16987.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01288, ecapa_loss=0.0003294, whisper_loss=0.09961, over 3868930.41 frames. ], batch size: 70, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:02:39,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.94 vs. limit=22.5 2024-08-09 22:02:53,135 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.25 vs. limit=22.5 2024-08-09 22:03:02,315 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 22:03:18,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=220780.0, ans=0.04949747468305833 2024-08-09 22:03:31,164 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 22:03:43,907 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7600, loss[loss=0.1226, beats_loss=0.01281, ecapa_loss=0.0003044, whisper_loss=0.1068, over 22583.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01283, ecapa_loss=0.0003298, whisper_loss=0.09959, over 3845805.57 frames. ], batch size: 88, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:03:45,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220980.0, ans=0.1 2024-08-09 22:03:46,360 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.898e+01 3.243e+01 3.786e+01 9.374e+01, threshold=6.487e+01, percent-clipped=2.0 2024-08-09 22:04:05,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=221080.0, ans=0.125 2024-08-09 22:04:08,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=221080.0, ans=0.125 2024-08-09 22:04:17,920 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 22:04:22,322 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 22:04:22,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=221180.0, ans=0.125 2024-08-09 22:04:23,637 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 22:04:33,447 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 22:04:51,729 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7650, loss[loss=0.1133, beats_loss=0.01442, ecapa_loss=0.0003545, whisper_loss=0.09536, over 17388.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01266, ecapa_loss=0.0003319, whisper_loss=0.1005, over 3807980.31 frames. ], batch size: 71, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:05:00,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=221480.0, ans=0.0 2024-08-09 22:05:06,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=221580.0, ans=0.125 2024-08-09 22:05:06,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=221580.0, ans=0.125 2024-08-09 22:05:11,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=221580.0, ans=0.125 2024-08-09 22:05:12,772 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-09 22:05:26,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=221680.0, ans=0.125 2024-08-09 22:05:33,499 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 11 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 22:05:50,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=221780.0, ans=0.125 2024-08-09 22:05:53,071 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-09 22:05:58,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=221880.0, ans=0.04949747468305833 2024-08-09 22:06:11,870 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2024-08-09 22:06:13,012 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7700, loss[loss=0.1049, beats_loss=0.01268, ecapa_loss=0.0002809, whisper_loss=0.08939, over 18326.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.0127, ecapa_loss=0.0003326, whisper_loss=0.1005, over 3811201.08 frames. ], batch size: 72, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:06:15,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.870e+01 3.289e+01 3.671e+01 6.131e+01, threshold=6.578e+01, percent-clipped=0.0 2024-08-09 22:06:18,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=221980.0, ans=0.0 2024-08-09 22:06:24,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=221980.0, ans=0.125 2024-08-09 22:06:37,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=222080.0, ans=0.0 2024-08-09 22:06:46,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222080.0, ans=0.1 2024-08-09 22:06:50,703 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 32 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 22:06:59,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.01 vs. limit=15.0 2024-08-09 22:07:26,174 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 22:07:46,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=222380.0, ans=0.0 2024-08-09 22:07:59,387 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7750, loss[loss=0.125, beats_loss=0.01268, ecapa_loss=0.0003232, whisper_loss=0.1091, over 14810.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01273, ecapa_loss=0.0003346, whisper_loss=0.1003, over 3823267.89 frames. ], batch size: 59, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:08:20,503 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-09 22:08:25,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=222580.0, ans=0.0 2024-08-09 22:08:46,764 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 19 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-09 22:09:03,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=222880.0, ans=0.05 2024-08-09 22:09:05,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=222880.0, ans=0.125 2024-08-09 22:09:09,690 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 22:09:16,386 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7800, loss[loss=0.1082, beats_loss=0.01276, ecapa_loss=0.0003294, whisper_loss=0.09219, over 22876.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01278, ecapa_loss=0.0003328, whisper_loss=0.1006, over 3861278.83 frames. ], batch size: 95, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:09:19,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 3.196e+01 3.636e+01 4.618e+01 8.254e+01, threshold=7.273e+01, percent-clipped=2.0 2024-08-09 22:09:23,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=222980.0, ans=0.125 2024-08-09 22:09:28,746 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-09 22:10:15,797 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 22:10:22,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=223380.0, ans=0.1 2024-08-09 22:10:30,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=223380.0, ans=0.125 2024-08-09 22:10:32,561 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7850, loss[loss=0.1388, beats_loss=0.00865, ecapa_loss=0.0003481, whisper_loss=0.1267, over 14692.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01266, ecapa_loss=0.000334, whisper_loss=0.1021, over 3892254.43 frames. ], batch size: 54, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:10:42,080 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 14 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 22:10:44,542 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 22:10:55,302 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.98 vs. limit=22.5 2024-08-09 22:11:04,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=223680.0, ans=0.2 2024-08-09 22:11:19,417 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 22:11:32,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=223880.0, ans=0.0 2024-08-09 22:11:39,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=223880.0, ans=0.0 2024-08-09 22:11:47,068 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7900, loss[loss=0.1195, beats_loss=0.01163, ecapa_loss=0.0003414, whisper_loss=0.1045, over 22557.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01274, ecapa_loss=0.0003312, whisper_loss=0.1009, over 3873540.07 frames. ], batch size: 90, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:11:50,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.925e+01 3.324e+01 4.014e+01 6.320e+01, threshold=6.647e+01, percent-clipped=0.0 2024-08-09 22:11:54,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=12.0 2024-08-09 22:12:10,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=224080.0, ans=0.0 2024-08-09 22:12:13,557 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-09 22:12:16,333 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-09 22:12:24,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=224180.0, ans=0.2 2024-08-09 22:12:42,525 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 22:12:59,598 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-09 22:13:01,174 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 22:13:05,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=224480.0, ans=0.125 2024-08-09 22:13:06,388 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 7950, loss[loss=0.1209, beats_loss=0.01081, ecapa_loss=0.0003607, whisper_loss=0.1065, over 22193.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01283, ecapa_loss=0.0003285, whisper_loss=0.1, over 3896983.63 frames. ], batch size: 90, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:13:07,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=224480.0, ans=0.1 2024-08-09 22:13:16,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=224480.0, ans=0.0 2024-08-09 22:13:31,596 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 22:13:39,496 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=22.5 2024-08-09 22:13:43,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=224680.0, ans=0.0 2024-08-09 22:13:43,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=224680.0, ans=0.2 2024-08-09 22:13:55,492 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-09 22:13:58,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=224780.0, ans=0.125 2024-08-09 22:14:07,522 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 22:14:12,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=224880.0, ans=0.125 2024-08-09 22:14:19,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=224980.0, ans=0.2 2024-08-09 22:14:20,687 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8000, loss[loss=0.09039, beats_loss=0.01457, ecapa_loss=0.0003479, whisper_loss=0.07234, over 21817.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01281, ecapa_loss=0.0003273, whisper_loss=0.0994, over 3892038.95 frames. ], batch size: 91, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:14:23,861 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 3.124e+01 3.387e+01 3.961e+01 6.094e+01, threshold=6.774e+01, percent-clipped=0.0 2024-08-09 22:14:25,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=224980.0, ans=0.125 2024-08-09 22:14:45,121 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-09 22:14:58,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=225180.0, ans=0.125 2024-08-09 22:15:16,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=225280.0, ans=0.0 2024-08-09 22:15:35,261 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8050, loss[loss=0.1224, beats_loss=0.01178, ecapa_loss=0.0003255, whisper_loss=0.1074, over 22805.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01272, ecapa_loss=0.0003261, whisper_loss=0.1003, over 3882220.90 frames. ], batch size: 90, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:15:39,444 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-09 22:15:43,980 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 31 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 22:15:48,498 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-09 22:15:54,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=225580.0, ans=0.125 2024-08-09 22:16:00,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=225580.0, ans=0.125 2024-08-09 22:16:43,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=225880.0, ans=0.0 2024-08-09 22:16:50,599 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8100, loss[loss=0.1283, beats_loss=0.01037, ecapa_loss=0.0003929, whisper_loss=0.114, over 18262.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01269, ecapa_loss=0.0003241, whisper_loss=0.1006, over 3859055.03 frames. ], batch size: 73, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:16:53,621 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 2.949e+01 3.347e+01 3.946e+01 6.724e+01, threshold=6.694e+01, percent-clipped=0.0 2024-08-09 22:16:55,258 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-09 22:17:25,542 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-09 22:17:28,540 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 22:17:32,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=226180.0, ans=0.125 2024-08-09 22:17:43,521 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-08-09 22:17:56,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=226380.0, ans=0.125 2024-08-09 22:18:03,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=226380.0, ans=0.1 2024-08-09 22:18:05,949 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8150, loss[loss=0.14, beats_loss=0.009552, ecapa_loss=0.0003516, whisper_loss=0.1269, over 21086.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01269, ecapa_loss=0.0003237, whisper_loss=0.1009, over 3866328.54 frames. ], batch size: 80, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:18:15,378 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 22:18:15,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=226480.0, ans=0.0 2024-08-09 22:18:20,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=226580.0, ans=0.035 2024-08-09 22:18:28,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=226580.0, ans=0.0 2024-08-09 22:18:33,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=226580.0, ans=0.125 2024-08-09 22:18:42,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=226680.0, ans=0.0 2024-08-09 22:18:50,562 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=15.0 2024-08-09 22:18:53,251 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-09 22:19:01,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=226780.0, ans=0.2 2024-08-09 22:19:05,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=226880.0, ans=0.0 2024-08-09 22:19:11,870 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-09 22:19:15,168 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-08-09 22:19:23,129 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8200, loss[loss=0.1251, beats_loss=0.01198, ecapa_loss=0.0003126, whisper_loss=0.11, over 20298.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01264, ecapa_loss=0.0003256, whisper_loss=0.1009, over 3899457.30 frames. ], batch size: 83, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:19:25,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+01 3.072e+01 3.518e+01 4.235e+01 6.207e+01, threshold=7.036e+01, percent-clipped=0.0 2024-08-09 22:19:26,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.87 vs. limit=15.0 2024-08-09 22:19:31,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=226980.0, ans=0.0 2024-08-09 22:19:31,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226980.0, ans=0.1 2024-08-09 22:19:41,622 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-09 22:19:44,962 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 22:19:46,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=227080.0, ans=0.2 2024-08-09 22:20:04,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227180.0, ans=0.1 2024-08-09 22:20:15,691 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2024-08-09 22:20:19,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=227280.0, ans=0.2 2024-08-09 22:20:22,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227380.0, ans=0.1 2024-08-09 22:20:39,579 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8250, loss[loss=0.09545, beats_loss=0.01331, ecapa_loss=0.0003425, whisper_loss=0.07872, over 20967.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01271, ecapa_loss=0.0003241, whisper_loss=0.1006, over 3903606.97 frames. ], batch size: 88, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:20:41,743 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2024-08-09 22:20:44,071 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 22:20:47,197 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 22:21:05,817 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-09 22:21:27,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.32 vs. limit=15.0 2024-08-09 22:21:31,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=227780.0, ans=0.125 2024-08-09 22:21:54,923 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 22:21:56,265 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8300, loss[loss=0.1143, beats_loss=0.01354, ecapa_loss=0.0003765, whisper_loss=0.09704, over 19064.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01272, ecapa_loss=0.0003199, whisper_loss=0.1002, over 3897523.99 frames. ], batch size: 84, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:21:56,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=227980.0, ans=0.2 2024-08-09 22:21:59,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.129e+01 2.849e+01 3.182e+01 3.709e+01 5.211e+01, threshold=6.363e+01, percent-clipped=0.0 2024-08-09 22:22:03,370 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-09 22:22:25,161 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-09 22:22:31,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=228180.0, ans=0.2 2024-08-09 22:22:34,288 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 22:22:42,170 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 13 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-09 22:22:47,322 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2024-08-09 22:22:54,707 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.61 vs. limit=10.0 2024-08-09 22:23:10,153 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8350, loss[loss=0.08631, beats_loss=0.01262, ecapa_loss=0.0003049, whisper_loss=0.07064, over 13361.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01272, ecapa_loss=0.0003219, whisper_loss=0.1001, over 3915362.48 frames. ], batch size: 54, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:23:36,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=228580.0, ans=0.125 2024-08-09 22:23:37,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=228580.0, ans=0.05 2024-08-09 22:23:50,793 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-09 22:24:09,005 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=15.0 2024-08-09 22:24:11,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=228880.0, ans=0.125 2024-08-09 22:24:26,807 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8400, loss[loss=0.1059, beats_loss=0.009461, ecapa_loss=0.0003195, whisper_loss=0.09325, over 15447.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01275, ecapa_loss=0.0003197, whisper_loss=0.101, over 3917973.57 frames. ], batch size: 58, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:24:29,557 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.962e+01 3.410e+01 4.213e+01 6.836e+01, threshold=6.819e+01, percent-clipped=3.0 2024-08-09 22:24:30,599 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.20 vs. limit=15.0 2024-08-09 22:24:37,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=228980.0, ans=0.1 2024-08-09 22:24:41,466 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=15.0 2024-08-09 22:24:48,163 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 22:24:48,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=229080.0, ans=0.0 2024-08-09 22:25:13,731 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2024-08-09 22:25:14,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=229280.0, ans=0.125 2024-08-09 22:25:18,868 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 9 from Vox, 36 fro AS 2024-08-09 22:25:19,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=229280.0, ans=0.0 2024-08-09 22:25:25,478 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 22:25:33,530 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-09 22:25:40,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229480.0, ans=0.1 2024-08-09 22:25:42,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8450, loss[loss=0.1422, beats_loss=0.01081, ecapa_loss=0.0003567, whisper_loss=0.1278, over 23675.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01266, ecapa_loss=0.0003223, whisper_loss=0.1012, over 3884309.96 frames. ], batch size: 93, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:25:56,296 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 22:26:15,781 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 24 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-09 22:26:34,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=229780.0, ans=0.125 2024-08-09 22:26:55,581 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 22:26:57,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=229980.0, ans=0.125 2024-08-09 22:26:58,020 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8500, loss[loss=0.1001, beats_loss=0.01319, ecapa_loss=0.0003473, whisper_loss=0.08343, over 22105.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01281, ecapa_loss=0.0003226, whisper_loss=0.1003, over 3878692.61 frames. ], batch size: 95, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:27:00,463 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2024-08-09 22:27:00,846 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.472e+01 3.102e+01 3.448e+01 4.001e+01 5.719e+01, threshold=6.896e+01, percent-clipped=0.0 2024-08-09 22:27:02,026 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2024-08-09 22:27:27,151 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-09 22:27:35,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=230180.0, ans=0.125 2024-08-09 22:27:40,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=230180.0, ans=0.125 2024-08-09 22:27:46,508 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.34 vs. limit=22.5 2024-08-09 22:27:48,838 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 22:27:56,731 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 22:28:11,571 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-09 22:28:12,705 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8550, loss[loss=0.1019, beats_loss=0.01267, ecapa_loss=0.0003625, whisper_loss=0.08563, over 21030.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.0128, ecapa_loss=0.0003197, whisper_loss=0.1005, over 3911785.70 frames. ], batch size: 87, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:28:15,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=230480.0, ans=0.015 2024-08-09 22:28:25,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=230480.0, ans=0.125 2024-08-09 22:28:29,199 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-09 22:28:32,720 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2024-08-09 22:28:33,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.65 vs. limit=22.5 2024-08-09 22:28:36,606 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 22:28:37,152 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.64 vs. limit=22.5 2024-08-09 22:28:43,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=230680.0, ans=0.125 2024-08-09 22:28:52,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=230680.0, ans=0.125 2024-08-09 22:29:00,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=230780.0, ans=0.2 2024-08-09 22:29:03,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=230780.0, ans=0.125 2024-08-09 22:29:09,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=230780.0, ans=0.125 2024-08-09 22:29:20,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=230880.0, ans=0.0 2024-08-09 22:29:26,312 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8600, loss[loss=0.1259, beats_loss=0.01315, ecapa_loss=0.0002975, whisper_loss=0.1097, over 23069.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01276, ecapa_loss=0.0003209, whisper_loss=0.1007, over 3910000.81 frames. ], batch size: 91, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:29:29,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.896e+01 3.419e+01 4.251e+01 8.504e+01, threshold=6.839e+01, percent-clipped=1.0 2024-08-09 22:29:31,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=230980.0, ans=0.125 2024-08-09 22:29:41,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=231080.0, ans=0.2 2024-08-09 22:29:49,735 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 22:29:55,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=231180.0, ans=0.125 2024-08-09 22:29:58,672 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 18 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-09 22:30:06,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231180.0, ans=0.1 2024-08-09 22:30:14,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=231280.0, ans=0.04949747468305833 2024-08-09 22:30:15,854 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 22:30:19,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231280.0, ans=0.1 2024-08-09 22:30:20,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=231280.0, ans=0.1 2024-08-09 22:30:38,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=231480.0, ans=0.0 2024-08-09 22:30:39,378 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8650, loss[loss=0.1061, beats_loss=0.01504, ecapa_loss=0.0003375, whisper_loss=0.0877, over 17685.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01274, ecapa_loss=0.0003229, whisper_loss=0.1008, over 3907561.27 frames. ], batch size: 74, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:30:57,262 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 22:31:00,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=231580.0, ans=0.0 2024-08-09 22:31:01,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.33 vs. limit=22.5 2024-08-09 22:31:10,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=231680.0, ans=0.125 2024-08-09 22:31:21,633 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=1.93 vs. limit=15.0 2024-08-09 22:31:22,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=231780.0, ans=0.125 2024-08-09 22:31:44,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=231880.0, ans=0.125 2024-08-09 22:31:49,562 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.33 vs. limit=22.5 2024-08-09 22:31:51,611 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8700, loss[loss=0.1297, beats_loss=0.01221, ecapa_loss=0.0003047, whisper_loss=0.1144, over 23158.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.0128, ecapa_loss=0.0003252, whisper_loss=0.1004, over 3871665.63 frames. ], batch size: 91, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:31:54,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 3.097e+01 3.569e+01 4.188e+01 5.734e+01, threshold=7.139e+01, percent-clipped=0.0 2024-08-09 22:31:54,599 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 22:32:16,393 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 22:32:40,045 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-09 22:32:42,103 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.79 vs. limit=22.5 2024-08-09 22:33:07,493 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8750, loss[loss=0.1047, beats_loss=0.01182, ecapa_loss=0.0004425, whisper_loss=0.08843, over 18630.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.0128, ecapa_loss=0.0003259, whisper_loss=0.1001, over 3869575.64 frames. ], batch size: 84, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:33:29,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=232580.0, ans=0.1 2024-08-09 22:33:33,109 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-09 22:33:35,087 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 22:33:37,265 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.15 vs. limit=15.0 2024-08-09 22:34:06,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=232880.0, ans=0.125 2024-08-09 22:34:19,996 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8800, loss[loss=0.1173, beats_loss=0.01261, ecapa_loss=0.0003305, whisper_loss=0.1013, over 22423.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01283, ecapa_loss=0.0003263, whisper_loss=0.1, over 3846189.49 frames. ], batch size: 90, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:34:23,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.53 vs. limit=15.0 2024-08-09 22:34:23,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+01 3.102e+01 3.612e+01 4.206e+01 6.577e+01, threshold=7.224e+01, percent-clipped=0.0 2024-08-09 22:34:47,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=233080.0, ans=0.125 2024-08-09 22:34:54,042 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2024-08-09 22:35:08,758 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-09 22:35:14,386 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-09 22:35:16,376 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-08-09 22:35:17,607 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-09 22:35:20,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=233380.0, ans=0.0 2024-08-09 22:35:24,715 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 24 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-09 22:35:25,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=233380.0, ans=0.125 2024-08-09 22:35:34,822 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8850, loss[loss=0.1009, beats_loss=0.01209, ecapa_loss=0.0003788, whisper_loss=0.08505, over 15074.00 frames. ], tot_loss[loss=0.116, beats_loss=0.0128, ecapa_loss=0.0003255, whisper_loss=0.09994, over 3842843.25 frames. ], batch size: 64, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:35:37,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=233480.0, ans=0.0 2024-08-09 22:35:38,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=233480.0, ans=0.5 2024-08-09 22:35:47,942 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 22:36:05,056 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2024-08-09 22:36:13,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=233680.0, ans=0.125 2024-08-09 22:36:31,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=233880.0, ans=0.125 2024-08-09 22:36:36,483 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.91 vs. limit=15.0 2024-08-09 22:36:45,034 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8900, loss[loss=0.1155, beats_loss=0.01266, ecapa_loss=0.0003526, whisper_loss=0.09931, over 16202.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01281, ecapa_loss=0.0003237, whisper_loss=0.09985, over 3853740.29 frames. ], batch size: 64, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:36:47,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.807e+01 3.249e+01 3.699e+01 6.208e+01, threshold=6.498e+01, percent-clipped=0.0 2024-08-09 22:36:57,713 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 22:37:07,998 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 22:37:11,660 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=15.0 2024-08-09 22:37:17,905 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=12.0 2024-08-09 22:37:34,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=234280.0, ans=0.125 2024-08-09 22:37:56,085 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 8950, loss[loss=0.1158, beats_loss=0.01541, ecapa_loss=0.0003132, whisper_loss=0.0973, over 21828.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01273, ecapa_loss=0.000324, whisper_loss=0.1006, over 3843510.25 frames. ], batch size: 89, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:37:59,132 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-09 22:37:59,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=234480.0, ans=0.125 2024-08-09 22:38:18,200 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-09 22:38:44,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=234780.0, ans=0.125 2024-08-09 22:38:47,075 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.022e-01 2024-08-09 22:39:00,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=234880.0, ans=0.0 2024-08-09 22:39:04,954 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9000, loss[loss=0.1243, beats_loss=0.0134, ecapa_loss=0.0002741, whisper_loss=0.1082, over 18205.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01265, ecapa_loss=0.0003252, whisper_loss=0.1014, over 3827914.00 frames. ], batch size: 68, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:39:04,955 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-09 22:39:43,730 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on ASR_libri: loss=0.2806, beats_loss=0, ecapa_loss=0.0009572, whisper_loss=0.2711, over 922467.00 frames. 2024-08-09 22:40:01,171 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on SV_voxceleb1: loss=0.008746, beats_loss=0, ecapa_loss=0.0008746, whisper_loss=0, over 939242.00 frames. 2024-08-09 22:41:51,888 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on AT_audioset: loss=0.02976, beats_loss=0.02976, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 22:41:51,891 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-09 22:41:54,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 3.054e+01 3.477e+01 3.947e+01 5.844e+01, threshold=6.953e+01, percent-clipped=0.0 2024-08-09 22:42:07,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=235080.0, ans=0.0 2024-08-09 22:42:07,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=235080.0, ans=0.07 2024-08-09 22:42:11,944 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-09 22:42:33,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=235280.0, ans=0.125 2024-08-09 22:42:52,574 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-09 22:42:57,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=235380.0, ans=0.0 2024-08-09 22:43:01,115 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-09 22:43:04,048 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9050, loss[loss=0.09584, beats_loss=0.01343, ecapa_loss=0.0003196, whisper_loss=0.07922, over 15440.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01272, ecapa_loss=0.0003249, whisper_loss=0.1006, over 3828702.67 frames. ], batch size: 62, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:43:06,101 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 12 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 22:43:29,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=235580.0, ans=0.2 2024-08-09 22:43:47,239 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 22:43:55,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235780.0, ans=0.1 2024-08-09 22:44:04,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=235880.0, ans=0.1 2024-08-09 22:44:08,668 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 22:44:17,554 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9100, loss[loss=0.1466, beats_loss=0.01064, ecapa_loss=0.0003388, whisper_loss=0.1326, over 22989.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01259, ecapa_loss=0.0003244, whisper_loss=0.1014, over 3817455.67 frames. ], batch size: 91, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:44:20,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.942e+01 3.415e+01 3.847e+01 6.703e+01, threshold=6.829e+01, percent-clipped=0.0 2024-08-09 22:44:24,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=235980.0, ans=0.1 2024-08-09 22:44:25,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=235980.0, ans=0.0 2024-08-09 22:44:26,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=235980.0, ans=0.125 2024-08-09 22:44:28,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=235980.0, ans=0.2 2024-08-09 22:45:05,722 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 22:45:15,185 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2024-08-09 22:45:22,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=236380.0, ans=0.125 2024-08-09 22:45:34,885 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9150, loss[loss=0.1421, beats_loss=0.01011, ecapa_loss=0.0003537, whisper_loss=0.1285, over 23041.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01257, ecapa_loss=0.0003223, whisper_loss=0.1015, over 3866000.80 frames. ], batch size: 93, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:45:38,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=236480.0, ans=0.125 2024-08-09 22:45:42,548 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.058e-02 2024-08-09 22:45:42,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=236480.0, ans=15.0 2024-08-09 22:45:43,395 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 22:45:56,839 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.69 vs. limit=22.5 2024-08-09 22:46:04,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=236680.0, ans=0.0 2024-08-09 22:46:14,575 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-09 22:46:14,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=236680.0, ans=0.125 2024-08-09 22:46:17,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=236680.0, ans=0.0 2024-08-09 22:46:23,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=236780.0, ans=0.0 2024-08-09 22:46:25,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=236780.0, ans=0.125 2024-08-09 22:46:37,479 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-09 22:46:40,987 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.88 vs. limit=6.0 2024-08-09 22:46:41,464 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 22:46:47,179 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-09 22:46:47,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=236980.0, ans=0.1 2024-08-09 22:46:48,548 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9200, loss[loss=0.1121, beats_loss=0.01393, ecapa_loss=0.0003453, whisper_loss=0.09474, over 15993.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.0126, ecapa_loss=0.0003228, whisper_loss=0.1013, over 3860854.85 frames. ], batch size: 64, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:46:51,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.835e+01 3.303e+01 3.887e+01 6.132e+01, threshold=6.605e+01, percent-clipped=0.0 2024-08-09 22:46:55,559 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=12.0 2024-08-09 22:47:36,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=237280.0, ans=0.0 2024-08-09 22:47:46,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=237380.0, ans=0.125 2024-08-09 22:47:51,871 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-09 22:48:04,340 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9250, loss[loss=0.1123, beats_loss=0.01323, ecapa_loss=0.0003384, whisper_loss=0.09569, over 19704.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01254, ecapa_loss=0.0003232, whisper_loss=0.101, over 3884654.95 frames. ], batch size: 82, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:48:22,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=237580.0, ans=0.125 2024-08-09 22:48:48,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-09 22:48:52,261 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.70 vs. limit=15.0 2024-08-09 22:49:04,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=237780.0, ans=0.125 2024-08-09 22:49:15,412 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-08-09 22:49:17,932 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 22:49:19,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=237880.0, ans=0.2 2024-08-09 22:49:24,991 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9300, loss[loss=0.1121, beats_loss=0.01256, ecapa_loss=0.0002971, whisper_loss=0.09655, over 22440.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01253, ecapa_loss=0.0003224, whisper_loss=0.1013, over 3905004.41 frames. ], batch size: 92, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:49:25,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=237980.0, ans=0.5 2024-08-09 22:49:27,787 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+01 3.042e+01 3.380e+01 4.213e+01 8.159e+01, threshold=6.761e+01, percent-clipped=3.0 2024-08-09 22:49:38,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=238080.0, ans=0.0 2024-08-09 22:49:45,244 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2024-08-09 22:49:47,326 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-09 22:50:14,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=238280.0, ans=0.07 2024-08-09 22:50:21,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=238280.0, ans=0.125 2024-08-09 22:50:32,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=238380.0, ans=0.125 2024-08-09 22:50:35,106 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-09 22:50:41,292 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9350, loss[loss=0.1179, beats_loss=0.01166, ecapa_loss=0.0003669, whisper_loss=0.1026, over 22281.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01257, ecapa_loss=0.0003235, whisper_loss=0.1011, over 3892648.41 frames. ], batch size: 91, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:50:46,632 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2024-08-09 22:50:47,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=238480.0, ans=0.125 2024-08-09 22:50:50,387 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 22:51:00,113 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-08-09 22:51:32,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=238580.0, ans=0.0 2024-08-09 22:51:36,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=238680.0, ans=0.05 2024-08-09 22:51:59,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=238780.0, ans=0.125 2024-08-09 22:52:01,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=238780.0, ans=0.125 2024-08-09 22:52:02,375 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2024-08-09 22:52:04,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=238780.0, ans=0.1 2024-08-09 22:52:09,250 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-09 22:52:15,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=238780.0, ans=0.125 2024-08-09 22:52:34,436 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9400, loss[loss=0.1147, beats_loss=0.01636, ecapa_loss=0.0003744, whisper_loss=0.09463, over 15792.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01261, ecapa_loss=0.0003242, whisper_loss=0.1008, over 3886365.47 frames. ], batch size: 67, lr: 2.41e-02, grad_scale: 65536.0 2024-08-09 22:52:37,857 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 2.975e+01 3.274e+01 3.809e+01 6.351e+01, threshold=6.548e+01, percent-clipped=0.0 2024-08-09 22:52:44,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=238980.0, ans=0.125 2024-08-09 22:52:44,595 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2024-08-09 22:52:51,749 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 22:53:16,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=239180.0, ans=0.0 2024-08-09 22:53:33,795 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 22:53:46,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=239380.0, ans=0.07 2024-08-09 22:53:53,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=239380.0, ans=0.125 2024-08-09 22:53:54,706 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-09 22:54:06,878 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9450, loss[loss=0.09191, beats_loss=0.01299, ecapa_loss=0.0002516, whisper_loss=0.07641, over 15303.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01247, ecapa_loss=0.0003249, whisper_loss=0.1011, over 3853725.95 frames. ], batch size: 55, lr: 2.41e-02, grad_scale: 65536.0 2024-08-09 22:54:24,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.56 vs. limit=22.5 2024-08-09 22:54:46,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=239580.0, ans=0.0 2024-08-09 22:55:09,784 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.39 vs. limit=22.5 2024-08-09 22:55:15,635 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 13 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 22:55:15,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=239780.0, ans=0.125 2024-08-09 22:55:51,685 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9500, loss[loss=0.1083, beats_loss=0.01694, ecapa_loss=0.0003308, whisper_loss=0.08806, over 21147.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01257, ecapa_loss=0.000324, whisper_loss=0.1012, over 3876177.25 frames. ], batch size: 89, lr: 2.41e-02, grad_scale: 65536.0 2024-08-09 22:55:53,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=239980.0, ans=0.0 2024-08-09 22:55:59,377 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.955e+01 3.513e+01 3.972e+01 7.065e+01, threshold=7.026e+01, percent-clipped=1.0 2024-08-09 22:56:11,210 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2024-08-09 22:56:20,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=240080.0, ans=0.0 2024-08-09 22:57:43,322 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-09 22:57:50,809 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9550, loss[loss=0.1078, beats_loss=0.01427, ecapa_loss=0.0003095, whisper_loss=0.09042, over 17703.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01265, ecapa_loss=0.0003247, whisper_loss=0.1, over 3885376.61 frames. ], batch size: 72, lr: 2.41e-02, grad_scale: 131072.0 2024-08-09 22:58:12,080 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 14 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 22:58:18,397 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 10 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 22:58:23,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=240580.0, ans=0.2 2024-08-09 22:58:49,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=240680.0, ans=0.1 2024-08-09 22:59:18,204 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-09 22:59:18,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=240780.0, ans=0.125 2024-08-09 22:59:23,686 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.50 vs. limit=15.0 2024-08-09 22:59:24,636 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-09 22:59:27,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=240880.0, ans=0.07 2024-08-09 22:59:29,738 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 22:59:32,701 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-09 22:59:38,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=240880.0, ans=0.125 2024-08-09 22:59:40,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=240880.0, ans=0.2 2024-08-09 22:59:46,637 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9600, loss[loss=0.1408, beats_loss=0.01091, ecapa_loss=0.0002746, whisper_loss=0.1272, over 23785.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01259, ecapa_loss=0.0003251, whisper_loss=0.1004, over 3889926.71 frames. ], batch size: 87, lr: 2.41e-02, grad_scale: 131072.0 2024-08-09 22:59:49,812 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.841e+01 3.249e+01 3.780e+01 5.366e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-09 23:00:07,480 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 23:00:16,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=241080.0, ans=0.05 2024-08-09 23:00:23,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=241080.0, ans=0.0 2024-08-09 23:00:24,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=241080.0, ans=0.07 2024-08-09 23:00:36,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=241180.0, ans=0.125 2024-08-09 23:00:51,370 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 23:00:57,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=241280.0, ans=0.1 2024-08-09 23:01:27,369 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-09 23:01:32,321 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-09 23:01:32,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=241480.0, ans=0.125 2024-08-09 23:01:33,891 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9650, loss[loss=0.127, beats_loss=0.01351, ecapa_loss=0.0003717, whisper_loss=0.1098, over 22535.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01262, ecapa_loss=0.0003261, whisper_loss=0.1003, over 3846468.24 frames. ], batch size: 92, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:01:34,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=241480.0, ans=0.125 2024-08-09 23:01:40,789 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-09 23:01:42,659 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 23:01:50,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=241580.0, ans=0.125 2024-08-09 23:01:53,368 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 12 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-09 23:02:22,633 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 23:02:42,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=241880.0, ans=0.1 2024-08-09 23:02:43,353 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-09 23:02:49,544 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2024-08-09 23:02:52,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.90 vs. limit=10.0 2024-08-09 23:02:55,515 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 23:02:58,021 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9700, loss[loss=0.1089, beats_loss=0.01303, ecapa_loss=0.0003324, whisper_loss=0.09256, over 22415.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01259, ecapa_loss=0.0003301, whisper_loss=0.09999, over 3817640.80 frames. ], batch size: 92, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:03:01,522 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 3.064e+01 3.484e+01 4.019e+01 6.587e+01, threshold=6.968e+01, percent-clipped=2.0 2024-08-09 23:03:01,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=241980.0, ans=0.1 2024-08-09 23:03:06,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=241980.0, ans=0.125 2024-08-09 23:03:08,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=241980.0, ans=0.0 2024-08-09 23:03:15,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=242080.0, ans=0.125 2024-08-09 23:03:20,782 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 23:03:21,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=242080.0, ans=0.0 2024-08-09 23:03:29,081 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 23:03:44,695 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 16 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 23:03:52,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=242280.0, ans=0.0 2024-08-09 23:03:58,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=242280.0, ans=0.0 2024-08-09 23:04:03,044 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 23:04:21,717 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9750, loss[loss=0.1275, beats_loss=0.01193, ecapa_loss=0.0003257, whisper_loss=0.1123, over 17934.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.0126, ecapa_loss=0.0003285, whisper_loss=0.09934, over 3799835.71 frames. ], batch size: 71, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:04:21,886 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-09 23:04:35,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=242480.0, ans=0.025 2024-08-09 23:04:46,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=242580.0, ans=0.0 2024-08-09 23:04:46,329 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 23:04:47,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=242580.0, ans=0.5 2024-08-09 23:04:49,730 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.03 vs. limit=10.0 2024-08-09 23:04:54,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=242680.0, ans=0.1 2024-08-09 23:04:54,736 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.94 vs. limit=22.5 2024-08-09 23:04:56,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=242680.0, ans=0.125 2024-08-09 23:05:00,941 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 23:05:04,375 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.91 vs. limit=12.0 2024-08-09 23:05:15,030 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.38 vs. limit=10.0 2024-08-09 23:05:21,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=242780.0, ans=0.2 2024-08-09 23:05:26,097 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 23:05:30,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=242880.0, ans=0.1 2024-08-09 23:05:34,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=242880.0, ans=0.125 2024-08-09 23:05:41,912 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9800, loss[loss=0.08124, beats_loss=0.01582, ecapa_loss=0.0002834, whisper_loss=0.06258, over 13934.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01262, ecapa_loss=0.0003255, whisper_loss=0.09941, over 3838772.16 frames. ], batch size: 55, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:05:44,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.875e+01 3.358e+01 3.972e+01 6.084e+01, threshold=6.716e+01, percent-clipped=0.0 2024-08-09 23:05:49,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=242980.0, ans=0.125 2024-08-09 23:06:14,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=243180.0, ans=0.1 2024-08-09 23:06:43,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=243280.0, ans=0.125 2024-08-09 23:07:05,441 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9850, loss[loss=0.1221, beats_loss=0.01223, ecapa_loss=0.0002508, whisper_loss=0.1074, over 19956.00 frames. ], tot_loss[loss=0.115, beats_loss=0.0127, ecapa_loss=0.0003252, whisper_loss=0.09901, over 3853691.51 frames. ], batch size: 76, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:07:10,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=243480.0, ans=0.0 2024-08-09 23:07:46,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=243680.0, ans=0.125 2024-08-09 23:07:53,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=243680.0, ans=0.1 2024-08-09 23:08:07,880 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 35 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-09 23:08:09,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=243780.0, ans=0.0 2024-08-09 23:08:11,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=243780.0, ans=0.0 2024-08-09 23:08:21,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=243880.0, ans=0.05 2024-08-09 23:08:30,914 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 31 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-09 23:08:32,637 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 21 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-09 23:08:33,685 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9900, loss[loss=0.1246, beats_loss=0.01327, ecapa_loss=0.0003307, whisper_loss=0.1081, over 13702.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.0127, ecapa_loss=0.0003229, whisper_loss=0.09999, over 3872732.71 frames. ], batch size: 56, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:08:36,606 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 3.025e+01 3.445e+01 3.906e+01 6.336e+01, threshold=6.890e+01, percent-clipped=0.0 2024-08-09 23:08:38,955 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 10 from Vox, 38 fro AS 2024-08-09 23:08:55,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=244080.0, ans=0.125 2024-08-09 23:09:18,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=244180.0, ans=0.125 2024-08-09 23:09:39,391 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-09 23:09:43,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=244380.0, ans=0.1 2024-08-09 23:09:55,063 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 9950, loss[loss=0.1123, beats_loss=0.01107, ecapa_loss=0.0003396, whisper_loss=0.0978, over 14753.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01269, ecapa_loss=0.0003242, whisper_loss=0.09951, over 3854651.06 frames. ], batch size: 59, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:09:59,133 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-09 23:10:14,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=244580.0, ans=0.125 2024-08-09 23:10:38,987 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 31 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 23:10:59,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.94 vs. limit=15.0 2024-08-09 23:11:00,008 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 23:11:00,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=244880.0, ans=0.125 2024-08-09 23:11:02,102 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-09 23:11:18,706 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10000, loss[loss=0.1225, beats_loss=0.01182, ecapa_loss=0.000345, whisper_loss=0.1073, over 22806.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01266, ecapa_loss=0.000325, whisper_loss=0.09916, over 3848752.81 frames. ], batch size: 91, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:11:22,323 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.876e+01 3.207e+01 3.745e+01 5.513e+01, threshold=6.413e+01, percent-clipped=0.0 2024-08-09 23:11:24,885 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=12.0 2024-08-09 23:11:34,810 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.50 vs. limit=12.0 2024-08-09 23:11:48,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=245080.0, ans=0.125 2024-08-09 23:12:20,479 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 23:12:21,021 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2024-08-09 23:12:21,722 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 23:12:24,466 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 23:12:35,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=245380.0, ans=0.125 2024-08-09 23:12:36,242 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-09 23:12:46,475 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2024-08-09 23:12:50,737 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10050, loss[loss=0.1138, beats_loss=0.01171, ecapa_loss=0.000324, whisper_loss=0.09881, over 21304.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01257, ecapa_loss=0.0003251, whisper_loss=0.09956, over 3870657.03 frames. ], batch size: 85, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:12:58,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=245480.0, ans=15.0 2024-08-09 23:13:20,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=245580.0, ans=0.0 2024-08-09 23:13:23,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-08-09 23:13:27,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=245680.0, ans=0.125 2024-08-09 23:13:48,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=245780.0, ans=0.2 2024-08-09 23:13:56,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=245780.0, ans=0.125 2024-08-09 23:14:01,533 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 23:14:02,029 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2024-08-09 23:14:09,108 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-09 23:14:14,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=245880.0, ans=0.125 2024-08-09 23:14:19,225 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 27 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-09 23:14:19,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=245880.0, ans=0.2 2024-08-09 23:14:24,036 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10100, loss[loss=0.1217, beats_loss=0.01635, ecapa_loss=0.0002135, whisper_loss=0.1032, over 17322.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01262, ecapa_loss=0.0003255, whisper_loss=0.1002, over 3894145.76 frames. ], batch size: 65, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:14:28,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.998e+01 3.344e+01 3.820e+01 6.746e+01, threshold=6.687e+01, percent-clipped=3.0 2024-08-09 23:14:30,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=245980.0, ans=0.0 2024-08-09 23:14:35,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=245980.0, ans=0.125 2024-08-09 23:14:39,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=246080.0, ans=0.2 2024-08-09 23:14:47,651 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 34 from LS+wenet, 11 from Vox, 41 fro AS 2024-08-09 23:14:55,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=246080.0, ans=0.1 2024-08-09 23:14:58,183 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-09 23:15:00,679 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 23:15:23,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=246280.0, ans=0.0 2024-08-09 23:15:28,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=246380.0, ans=0.2 2024-08-09 23:15:40,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=246380.0, ans=0.125 2024-08-09 23:15:43,276 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10150, loss[loss=0.1369, beats_loss=0.01322, ecapa_loss=0.0002838, whisper_loss=0.1209, over 23612.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01269, ecapa_loss=0.0003254, whisper_loss=0.09955, over 3914179.96 frames. ], batch size: 92, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:15:54,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-09 23:16:03,365 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2024-08-09 23:16:05,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=246580.0, ans=0.125 2024-08-09 23:16:10,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=246580.0, ans=0.1 2024-08-09 23:16:13,874 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 23:16:21,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=246680.0, ans=0.1 2024-08-09 23:16:22,289 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=15.0 2024-08-09 23:16:44,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=246880.0, ans=12.0 2024-08-09 23:16:53,706 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-09 23:16:57,858 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10200, loss[loss=0.1384, beats_loss=0.009609, ecapa_loss=0.0003864, whisper_loss=0.1249, over 22726.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01263, ecapa_loss=0.0003254, whisper_loss=0.09978, over 3912852.52 frames. ], batch size: 91, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:17:00,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.913e+01 3.327e+01 3.843e+01 5.703e+01, threshold=6.654e+01, percent-clipped=0.0 2024-08-09 23:17:14,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=247080.0, ans=0.0 2024-08-09 23:17:23,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=247080.0, ans=0.1 2024-08-09 23:17:30,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=247180.0, ans=0.125 2024-08-09 23:17:32,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=247180.0, ans=0.5 2024-08-09 23:17:37,589 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2024-08-09 23:17:42,436 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 23:17:44,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=247280.0, ans=0.0 2024-08-09 23:18:06,311 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.28 vs. limit=22.5 2024-08-09 23:18:09,934 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10250, loss[loss=0.08444, beats_loss=0.01466, ecapa_loss=0.000233, whisper_loss=0.06745, over 18141.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.0126, ecapa_loss=0.0003237, whisper_loss=0.09965, over 3929896.20 frames. ], batch size: 72, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:18:32,222 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 23:18:34,814 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.76 vs. limit=10.0 2024-08-09 23:18:40,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=247680.0, ans=0.125 2024-08-09 23:19:10,860 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 23:19:12,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=247880.0, ans=0.0 2024-08-09 23:19:19,322 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-09 23:19:21,668 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10300, loss[loss=0.1065, beats_loss=0.01399, ecapa_loss=0.0002956, whisper_loss=0.08953, over 22538.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01267, ecapa_loss=0.0003236, whisper_loss=0.09923, over 3931246.92 frames. ], batch size: 92, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:19:23,561 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-09 23:19:25,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 3.179e+01 3.546e+01 4.118e+01 7.373e+01, threshold=7.091e+01, percent-clipped=1.0 2024-08-09 23:19:26,884 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 23:19:53,705 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 23:20:04,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=248280.0, ans=0.125 2024-08-09 23:20:33,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=248480.0, ans=0.2 2024-08-09 23:20:34,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10350, loss[loss=0.0958, beats_loss=0.01478, ecapa_loss=0.0002835, whisper_loss=0.07818, over 21028.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01273, ecapa_loss=0.0003207, whisper_loss=0.09945, over 3943153.01 frames. ], batch size: 85, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:20:34,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=248480.0, ans=0.2 2024-08-09 23:20:45,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=248480.0, ans=0.2 2024-08-09 23:21:03,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=248680.0, ans=0.0 2024-08-09 23:21:07,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=248680.0, ans=0.1 2024-08-09 23:21:11,057 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 23:21:14,940 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-09 23:21:15,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=248680.0, ans=0.125 2024-08-09 23:21:19,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=248780.0, ans=0.09899494936611666 2024-08-09 23:21:22,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=248780.0, ans=0.125 2024-08-09 23:21:46,248 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10400, loss[loss=0.1021, beats_loss=0.008097, ecapa_loss=0.0004116, whisper_loss=0.08988, over 15985.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01277, ecapa_loss=0.0003198, whisper_loss=0.09885, over 3918532.75 frames. ], batch size: 67, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:21:46,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=248980.0, ans=0.125 2024-08-09 23:21:48,817 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.398e+01 2.757e+01 3.226e+01 3.794e+01 6.112e+01, threshold=6.451e+01, percent-clipped=0.0 2024-08-09 23:21:56,908 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2024-08-09 23:21:59,245 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.799e-01 2024-08-09 23:22:07,605 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=15.0 2024-08-09 23:22:23,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=249180.0, ans=0.125 2024-08-09 23:22:23,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.34 vs. limit=22.5 2024-08-09 23:22:29,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=249280.0, ans=0.1 2024-08-09 23:22:32,977 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 23:22:37,560 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 23:22:42,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=249380.0, ans=0.125 2024-08-09 23:22:54,647 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10450, loss[loss=0.1233, beats_loss=0.01109, ecapa_loss=0.0003604, whisper_loss=0.1086, over 20378.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01276, ecapa_loss=0.0003195, whisper_loss=0.09864, over 3895139.13 frames. ], batch size: 83, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:22:56,390 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 23:23:12,662 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-09 23:23:25,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=249680.0, ans=0.2 2024-08-09 23:23:28,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=249680.0, ans=0.125 2024-08-09 23:23:40,080 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 23:23:41,866 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-09 23:23:43,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=249780.0, ans=0.0 2024-08-09 23:23:52,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=249880.0, ans=0.0 2024-08-09 23:24:02,754 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10500, loss[loss=0.07852, beats_loss=0.01445, ecapa_loss=0.0002601, whisper_loss=0.06147, over 16510.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01266, ecapa_loss=0.0003192, whisper_loss=0.09906, over 3868121.18 frames. ], batch size: 65, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:24:04,973 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2024-08-09 23:24:05,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.948e+01 3.458e+01 4.084e+01 6.883e+01, threshold=6.915e+01, percent-clipped=1.0 2024-08-09 23:24:18,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=250080.0, ans=0.125 2024-08-09 23:24:20,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=250080.0, ans=0.1 2024-08-09 23:24:36,893 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-09 23:24:41,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=250180.0, ans=0.125 2024-08-09 23:24:46,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=250280.0, ans=0.125 2024-08-09 23:24:53,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=250280.0, ans=0.1 2024-08-09 23:24:56,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=250280.0, ans=0.09899494936611666 2024-08-09 23:24:57,558 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 23:24:58,371 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=11.22 vs. limit=10.0 2024-08-09 23:25:01,784 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 13 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 23:25:03,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=250380.0, ans=0.1 2024-08-09 23:25:13,617 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10550, loss[loss=0.1276, beats_loss=0.01216, ecapa_loss=0.0003424, whisper_loss=0.112, over 22490.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01271, ecapa_loss=0.0003192, whisper_loss=0.09878, over 3880520.83 frames. ], batch size: 89, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:25:28,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=250580.0, ans=0.0 2024-08-09 23:25:35,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=250580.0, ans=0.2 2024-08-09 23:25:47,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=250680.0, ans=0.125 2024-08-09 23:25:49,365 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.41 vs. limit=10.0 2024-08-09 23:26:02,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=250780.0, ans=0.125 2024-08-09 23:26:22,027 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.00 vs. limit=22.5 2024-08-09 23:26:22,719 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10600, loss[loss=0.1185, beats_loss=0.01174, ecapa_loss=0.000319, whisper_loss=0.1036, over 22470.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01278, ecapa_loss=0.0003173, whisper_loss=0.09941, over 3890645.34 frames. ], batch size: 91, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:26:25,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 3.120e+01 3.519e+01 3.971e+01 7.530e+01, threshold=7.037e+01, percent-clipped=1.0 2024-08-09 23:26:28,924 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.86 vs. limit=10.0 2024-08-09 23:26:31,151 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-09 23:26:40,096 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-09 23:26:52,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=251180.0, ans=0.0 2024-08-09 23:26:56,151 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.27 vs. limit=15.0 2024-08-09 23:26:59,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=251180.0, ans=0.2 2024-08-09 23:27:07,235 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.57 vs. limit=22.5 2024-08-09 23:27:12,129 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 23:27:12,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=251280.0, ans=0.1 2024-08-09 23:27:12,674 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2024-08-09 23:27:16,134 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-09 23:27:32,184 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10650, loss[loss=0.1289, beats_loss=0.01161, ecapa_loss=0.0002954, whisper_loss=0.1143, over 23748.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01265, ecapa_loss=0.0003179, whisper_loss=0.09995, over 3862331.99 frames. ], batch size: 90, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:27:38,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=251480.0, ans=0.0 2024-08-09 23:27:39,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.54 vs. limit=15.0 2024-08-09 23:27:59,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=251680.0, ans=0.125 2024-08-09 23:28:09,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=251680.0, ans=10.0 2024-08-09 23:28:11,938 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2024-08-09 23:28:12,388 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 19 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 23:28:15,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=251780.0, ans=0.0 2024-08-09 23:28:17,107 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-09 23:28:41,431 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10700, loss[loss=0.1291, beats_loss=0.0091, ecapa_loss=0.0004095, whisper_loss=0.1159, over 15470.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01272, ecapa_loss=0.0003155, whisper_loss=0.1003, over 3856375.45 frames. ], batch size: 62, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:28:44,312 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.278e+01 2.878e+01 3.295e+01 3.921e+01 5.869e+01, threshold=6.590e+01, percent-clipped=0.0 2024-08-09 23:28:53,854 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 23:28:58,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=252080.0, ans=0.125 2024-08-09 23:29:11,786 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=15.0 2024-08-09 23:29:20,332 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.48 vs. limit=8.0 2024-08-09 23:29:51,070 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10750, loss[loss=0.1062, beats_loss=0.01146, ecapa_loss=0.0003474, whisper_loss=0.09126, over 20132.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01271, ecapa_loss=0.0003161, whisper_loss=0.1001, over 3872409.19 frames. ], batch size: 85, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:29:56,113 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.59 vs. limit=22.5 2024-08-09 23:30:11,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=252580.0, ans=0.0 2024-08-09 23:30:14,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=252580.0, ans=0.0 2024-08-09 23:30:15,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=252580.0, ans=0.125 2024-08-09 23:30:38,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=252780.0, ans=0.0 2024-08-09 23:30:54,030 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=12.0 2024-08-09 23:31:00,288 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10800, loss[loss=0.1164, beats_loss=0.01064, ecapa_loss=0.0003701, whisper_loss=0.102, over 21567.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01262, ecapa_loss=0.0003178, whisper_loss=0.1006, over 3875459.15 frames. ], batch size: 90, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:31:03,108 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 3.032e+01 3.349e+01 3.769e+01 6.080e+01, threshold=6.698e+01, percent-clipped=0.0 2024-08-09 23:31:17,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=253080.0, ans=0.07 2024-08-09 23:31:28,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=253180.0, ans=0.125 2024-08-09 23:31:48,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=253280.0, ans=0.2 2024-08-09 23:31:51,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=253280.0, ans=0.125 2024-08-09 23:31:54,400 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 23:32:07,620 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10850, loss[loss=0.1198, beats_loss=0.01349, ecapa_loss=0.0002841, whisper_loss=0.1035, over 22512.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01262, ecapa_loss=0.0003185, whisper_loss=0.1011, over 3912461.35 frames. ], batch size: 88, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:32:16,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=253480.0, ans=0.0 2024-08-09 23:32:18,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=253480.0, ans=0.0 2024-08-09 23:32:25,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=253580.0, ans=0.2 2024-08-09 23:32:27,416 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=15.0 2024-08-09 23:32:31,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=253580.0, ans=0.1 2024-08-09 23:32:32,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=253580.0, ans=0.125 2024-08-09 23:32:41,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=253680.0, ans=0.2 2024-08-09 23:32:48,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=253780.0, ans=0.2 2024-08-09 23:32:50,336 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=18.52 vs. limit=15.0 2024-08-09 23:32:53,721 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-09 23:32:55,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=253780.0, ans=0.035 2024-08-09 23:33:15,448 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10900, loss[loss=0.1167, beats_loss=0.01387, ecapa_loss=0.0003493, whisper_loss=0.09933, over 22423.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01255, ecapa_loss=0.0003197, whisper_loss=0.1013, over 3932727.55 frames. ], batch size: 93, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:33:18,107 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+01 2.959e+01 3.403e+01 3.969e+01 5.664e+01, threshold=6.807e+01, percent-clipped=0.0 2024-08-09 23:33:21,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=253980.0, ans=0.5 2024-08-09 23:33:33,572 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-09 23:33:33,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=254080.0, ans=0.1 2024-08-09 23:34:04,170 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 23:34:15,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=254380.0, ans=0.125 2024-08-09 23:34:21,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=254480.0, ans=0.09899494936611666 2024-08-09 23:34:22,543 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 10950, loss[loss=0.1104, beats_loss=0.01399, ecapa_loss=0.0003335, whisper_loss=0.09309, over 15222.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01262, ecapa_loss=0.000318, whisper_loss=0.1013, over 3912086.91 frames. ], batch size: 61, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:34:26,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=254480.0, ans=0.2 2024-08-09 23:34:27,445 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-09 23:34:28,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=254480.0, ans=0.0 2024-08-09 23:34:28,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=254480.0, ans=0.125 2024-08-09 23:34:44,730 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-09 23:34:48,863 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.95 vs. limit=22.5 2024-08-09 23:35:11,368 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2024-08-09 23:35:20,350 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-09 23:35:28,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=254880.0, ans=0.125 2024-08-09 23:35:30,990 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11000, loss[loss=0.1458, beats_loss=0.009283, ecapa_loss=0.0003248, whisper_loss=0.1333, over 18851.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.0126, ecapa_loss=0.0003182, whisper_loss=0.1014, over 3923312.44 frames. ], batch size: 73, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:35:33,557 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.844e+01 3.291e+01 3.745e+01 5.513e+01, threshold=6.582e+01, percent-clipped=0.0 2024-08-09 23:35:51,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=255080.0, ans=0.2 2024-08-09 23:36:06,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=255180.0, ans=0.0 2024-08-09 23:36:30,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=255380.0, ans=0.2 2024-08-09 23:36:35,382 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2024-08-09 23:36:41,284 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11050, loss[loss=0.1117, beats_loss=0.01584, ecapa_loss=0.0002456, whisper_loss=0.09338, over 21338.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01258, ecapa_loss=0.0003204, whisper_loss=0.1012, over 3908471.75 frames. ], batch size: 84, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:36:48,572 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 23:37:10,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=255680.0, ans=0.125 2024-08-09 23:37:16,033 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-09 23:37:33,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=255780.0, ans=0.2 2024-08-09 23:37:33,806 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 23:37:44,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=255880.0, ans=0.07 2024-08-09 23:37:50,401 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11100, loss[loss=0.1198, beats_loss=0.01419, ecapa_loss=0.0002419, whisper_loss=0.1032, over 21902.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.0126, ecapa_loss=0.000318, whisper_loss=0.101, over 3902317.05 frames. ], batch size: 83, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:37:53,144 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.392e+01 3.083e+01 3.527e+01 4.357e+01 6.576e+01, threshold=7.054e+01, percent-clipped=0.0 2024-08-09 23:38:13,588 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-09 23:38:18,765 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 23:38:26,973 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 17 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-09 23:38:28,366 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 33 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-09 23:38:30,967 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 16 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-09 23:38:49,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=256380.0, ans=0.2 2024-08-09 23:38:59,607 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11150, loss[loss=0.1248, beats_loss=0.01202, ecapa_loss=0.0003027, whisper_loss=0.1098, over 23467.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01249, ecapa_loss=0.000319, whisper_loss=0.1009, over 3863105.53 frames. ], batch size: 93, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:39:02,550 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-09 23:39:05,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=256480.0, ans=0.125 2024-08-09 23:39:09,431 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 23:39:10,071 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2024-08-09 23:39:22,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=256580.0, ans=0.125 2024-08-09 23:39:27,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=256680.0, ans=0.0 2024-08-09 23:39:39,073 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-09 23:39:48,937 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 23:39:49,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=256780.0, ans=0.125 2024-08-09 23:39:51,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=256780.0, ans=0.05 2024-08-09 23:39:57,195 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 31 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-09 23:39:57,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=256880.0, ans=0.125 2024-08-09 23:40:09,589 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11200, loss[loss=0.1023, beats_loss=0.01444, ecapa_loss=0.0002548, whisper_loss=0.08529, over 22492.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01245, ecapa_loss=0.000319, whisper_loss=0.1015, over 3873102.02 frames. ], batch size: 89, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:40:12,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 3.109e+01 3.535e+01 4.149e+01 6.453e+01, threshold=7.070e+01, percent-clipped=0.0 2024-08-09 23:40:18,101 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 23:40:41,199 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-09 23:40:56,033 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 31 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-09 23:41:02,268 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2024-08-09 23:41:19,598 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11250, loss[loss=0.1401, beats_loss=0.01223, ecapa_loss=0.000339, whisper_loss=0.1244, over 23003.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01254, ecapa_loss=0.0003217, whisper_loss=0.1012, over 3856377.32 frames. ], batch size: 88, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:41:21,317 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-09 23:41:29,316 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.51 vs. limit=15.0 2024-08-09 23:41:45,059 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 23:41:57,685 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-09 23:42:00,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=257780.0, ans=0.125 2024-08-09 23:42:00,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=257780.0, ans=0.025 2024-08-09 23:42:13,919 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-09 23:42:22,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=257880.0, ans=0.1 2024-08-09 23:42:27,532 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2024-08-09 23:42:28,245 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11300, loss[loss=0.1146, beats_loss=0.01635, ecapa_loss=0.0002993, whisper_loss=0.09526, over 19160.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01254, ecapa_loss=0.0003194, whisper_loss=0.1015, over 3896442.52 frames. ], batch size: 81, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:42:31,217 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.444e+01 3.110e+01 3.449e+01 4.025e+01 6.550e+01, threshold=6.899e+01, percent-clipped=0.0 2024-08-09 23:42:46,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=258080.0, ans=0.2 2024-08-09 23:42:48,996 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 23:43:01,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=258180.0, ans=0.0 2024-08-09 23:43:30,347 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-09 23:43:36,189 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11350, loss[loss=0.1547, beats_loss=0.009203, ecapa_loss=0.0003302, whisper_loss=0.1422, over 19166.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01248, ecapa_loss=0.0003187, whisper_loss=0.1013, over 3911921.92 frames. ], batch size: 74, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:43:37,058 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.75 vs. limit=22.5 2024-08-09 23:43:43,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=258480.0, ans=0.125 2024-08-09 23:43:46,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=258480.0, ans=0.1 2024-08-09 23:43:46,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=258480.0, ans=0.2 2024-08-09 23:43:57,833 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2024-08-09 23:44:02,943 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-09 23:44:16,508 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 23:44:24,856 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 21 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 23:44:25,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=258780.0, ans=0.125 2024-08-09 23:44:34,641 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 14 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 23:44:37,631 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.882e-02 2024-08-09 23:44:44,750 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11400, loss[loss=0.0979, beats_loss=0.015, ecapa_loss=0.0002848, whisper_loss=0.08005, over 22753.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01251, ecapa_loss=0.0003174, whisper_loss=0.1014, over 3912563.78 frames. ], batch size: 93, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:44:45,509 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2024-08-09 23:44:45,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=258980.0, ans=15.0 2024-08-09 23:44:46,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=258980.0, ans=0.125 2024-08-09 23:44:47,686 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.272e+01 2.889e+01 3.232e+01 3.833e+01 5.860e+01, threshold=6.464e+01, percent-clipped=0.0 2024-08-09 23:44:48,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=258980.0, ans=0.0 2024-08-09 23:44:49,472 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-09 23:44:59,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2024-08-09 23:45:08,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=259080.0, ans=0.125 2024-08-09 23:45:20,361 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-09 23:45:24,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=259180.0, ans=10.0 2024-08-09 23:45:31,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=259280.0, ans=0.0 2024-08-09 23:45:33,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=259280.0, ans=0.1 2024-08-09 23:45:45,689 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2024-08-09 23:45:58,357 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11450, loss[loss=0.1119, beats_loss=0.01387, ecapa_loss=0.0003369, whisper_loss=0.09465, over 21336.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01262, ecapa_loss=0.0003176, whisper_loss=0.1015, over 3927877.19 frames. ], batch size: 89, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:46:27,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=259680.0, ans=0.125 2024-08-09 23:46:54,380 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 23:47:08,717 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11500, loss[loss=0.1097, beats_loss=0.0127, ecapa_loss=0.0003156, whisper_loss=0.09382, over 16994.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01257, ecapa_loss=0.0003174, whisper_loss=0.1015, over 3921061.18 frames. ], batch size: 69, lr: 2.32e-02, grad_scale: 131072.0 2024-08-09 23:47:11,357 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 3.028e+01 3.430e+01 4.047e+01 6.324e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-09 23:47:17,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=259980.0, ans=0.0 2024-08-09 23:47:29,544 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 34 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 23:47:38,897 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 38 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 23:47:39,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=260180.0, ans=0.125 2024-08-09 23:47:41,084 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.54 vs. limit=15.0 2024-08-09 23:47:41,133 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-09 23:47:44,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=11.74 vs. limit=12.0 2024-08-09 23:47:49,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=260280.0, ans=0.125 2024-08-09 23:48:03,558 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-09 23:48:12,448 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 23:48:17,461 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11550, loss[loss=0.1068, beats_loss=0.01238, ecapa_loss=0.000307, whisper_loss=0.0914, over 23043.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.0126, ecapa_loss=0.0003157, whisper_loss=0.1018, over 3938654.40 frames. ], batch size: 92, lr: 2.32e-02, grad_scale: 262144.0 2024-08-09 23:48:17,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=260480.0, ans=0.125 2024-08-09 23:48:37,738 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=12.0 2024-08-09 23:48:39,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=260580.0, ans=0.5 2024-08-09 23:48:42,370 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-09 23:48:44,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.11 vs. limit=6.0 2024-08-09 23:48:56,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=260680.0, ans=0.09899494936611666 2024-08-09 23:49:26,662 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11600, loss[loss=0.127, beats_loss=0.01291, ecapa_loss=0.0003118, whisper_loss=0.111, over 22339.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01253, ecapa_loss=0.0003157, whisper_loss=0.102, over 3930324.59 frames. ], batch size: 88, lr: 2.32e-02, grad_scale: 262144.0 2024-08-09 23:49:29,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.873e+01 3.365e+01 3.781e+01 5.038e+01, threshold=6.731e+01, percent-clipped=0.0 2024-08-09 23:49:36,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=260980.0, ans=0.07 2024-08-09 23:49:44,078 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 23:49:45,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=261080.0, ans=0.125 2024-08-09 23:50:15,786 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 23:50:25,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=261380.0, ans=0.0 2024-08-09 23:50:37,130 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11650, loss[loss=0.127, beats_loss=0.01173, ecapa_loss=0.0002887, whisper_loss=0.1124, over 15213.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01249, ecapa_loss=0.0003169, whisper_loss=0.102, over 3946645.74 frames. ], batch size: 59, lr: 2.32e-02, grad_scale: 262144.0 2024-08-09 23:50:40,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261480.0, ans=0.1 2024-08-09 23:50:45,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=261480.0, ans=0.0 2024-08-09 23:50:57,257 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 23:51:01,742 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 23:51:04,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=261680.0, ans=0.125 2024-08-09 23:51:07,737 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.52 vs. limit=22.5 2024-08-09 23:51:16,119 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 23:51:21,702 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 23:51:38,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=261880.0, ans=0.125 2024-08-09 23:51:38,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261880.0, ans=0.1 2024-08-09 23:51:46,910 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11700, loss[loss=0.09173, beats_loss=0.01428, ecapa_loss=0.000274, whisper_loss=0.07471, over 18732.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.0126, ecapa_loss=0.0003168, whisper_loss=0.1017, over 3952009.96 frames. ], batch size: 73, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:51:49,562 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 3.059e+01 3.535e+01 4.179e+01 1.066e+02, threshold=7.070e+01, percent-clipped=1.0 2024-08-09 23:51:51,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.74 vs. limit=22.5 2024-08-09 23:51:54,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=261980.0, ans=0.125 2024-08-09 23:51:59,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=262080.0, ans=0.125 2024-08-09 23:52:04,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=262080.0, ans=0.125 2024-08-09 23:52:07,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=262080.0, ans=0.125 2024-08-09 23:52:10,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=262080.0, ans=0.0 2024-08-09 23:52:10,430 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2024-08-09 23:52:13,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=262180.0, ans=0.125 2024-08-09 23:52:15,323 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 23:52:33,056 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 23:52:34,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=262280.0, ans=0.2 2024-08-09 23:52:46,910 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 23:52:52,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=262380.0, ans=0.125 2024-08-09 23:52:54,447 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11750, loss[loss=0.113, beats_loss=0.01633, ecapa_loss=0.0002039, whisper_loss=0.09463, over 23741.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01264, ecapa_loss=0.0003197, whisper_loss=0.101, over 3938922.79 frames. ], batch size: 93, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:53:03,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=262480.0, ans=0.125 2024-08-09 23:53:11,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=262580.0, ans=0.125 2024-08-09 23:53:17,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=15.0 2024-08-09 23:53:48,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=262880.0, ans=0.07 2024-08-09 23:53:54,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=262880.0, ans=0.0 2024-08-09 23:53:54,933 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 23:53:58,885 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 23:54:00,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=262880.0, ans=0.0 2024-08-09 23:54:02,790 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11800, loss[loss=0.09793, beats_loss=0.01421, ecapa_loss=0.0003005, whisper_loss=0.08072, over 15488.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01257, ecapa_loss=0.0003182, whisper_loss=0.101, over 3910935.64 frames. ], batch size: 62, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:54:05,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 3.014e+01 3.516e+01 4.289e+01 8.691e+01, threshold=7.033e+01, percent-clipped=2.0 2024-08-09 23:54:06,491 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.310e-01 2024-08-09 23:54:08,729 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-09 23:54:29,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=263180.0, ans=0.125 2024-08-09 23:54:39,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=263180.0, ans=0.5 2024-08-09 23:54:52,292 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 23:55:11,171 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11850, loss[loss=0.1142, beats_loss=0.01316, ecapa_loss=0.0003456, whisper_loss=0.09757, over 17515.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01262, ecapa_loss=0.0003183, whisper_loss=0.09992, over 3891072.68 frames. ], batch size: 71, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:55:13,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=263480.0, ans=0.0 2024-08-09 23:55:24,313 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-09 23:55:24,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=263580.0, ans=15.0 2024-08-09 23:55:50,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=263780.0, ans=0.2 2024-08-09 23:56:12,212 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 23:56:18,348 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11900, loss[loss=0.1354, beats_loss=0.01061, ecapa_loss=0.0003186, whisper_loss=0.1216, over 23806.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01258, ecapa_loss=0.0003163, whisper_loss=0.1011, over 3931220.35 frames. ], batch size: 92, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:56:21,129 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.968e+01 3.550e+01 4.423e+01 6.843e+01, threshold=7.099e+01, percent-clipped=0.0 2024-08-09 23:56:32,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=264080.0, ans=0.0 2024-08-09 23:56:52,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=264180.0, ans=0.125 2024-08-09 23:57:21,606 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 23:57:26,823 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 11950, loss[loss=0.09931, beats_loss=0.0122, ecapa_loss=0.0003462, whisper_loss=0.08365, over 17030.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01253, ecapa_loss=0.0003179, whisper_loss=0.1005, over 3903546.64 frames. ], batch size: 71, lr: 2.30e-02, grad_scale: 262144.0 2024-08-09 23:57:28,931 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.82 vs. limit=22.5 2024-08-09 23:57:30,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2024-08-09 23:57:38,334 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 23:57:38,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=264480.0, ans=0.125 2024-08-09 23:57:40,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=264580.0, ans=0.125 2024-08-09 23:57:46,415 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-09 23:57:51,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=264580.0, ans=0.0 2024-08-09 23:57:53,587 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-09 23:57:53,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=264680.0, ans=0.07 2024-08-09 23:57:59,164 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-08-09 23:58:02,841 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-09 23:58:05,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=264680.0, ans=0.5 2024-08-09 23:58:14,707 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.88 vs. limit=15.0 2024-08-09 23:58:16,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=264780.0, ans=0.125 2024-08-09 23:58:26,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=264880.0, ans=0.0 2024-08-09 23:58:27,976 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 23:58:29,825 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2024-08-09 23:58:35,881 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12000, loss[loss=0.1107, beats_loss=0.01274, ecapa_loss=0.0002994, whisper_loss=0.09496, over 22665.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01256, ecapa_loss=0.0003156, whisper_loss=0.09946, over 3837523.85 frames. ], batch size: 88, lr: 2.30e-02, grad_scale: 262144.0 2024-08-09 23:58:35,881 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-09 23:59:15,218 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on ASR_libri: loss=0.2807, beats_loss=0, ecapa_loss=0.0009345, whisper_loss=0.2713, over 922467.00 frames. 2024-08-09 23:59:32,422 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on SV_voxceleb1: loss=0.008336, beats_loss=0, ecapa_loss=0.0008336, whisper_loss=0, over 939242.00 frames. 2024-08-10 00:01:27,297 INFO [train_multi_KD3.py:1149] (1/4) Epoch 2, validation on AT_audioset: loss=0.02968, beats_loss=0.02968, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 00:01:27,301 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 00:01:29,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.166e+01 2.941e+01 3.442e+01 3.928e+01 6.406e+01, threshold=6.884e+01, percent-clipped=0.0 2024-08-10 00:02:31,840 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 00:02:37,154 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12050, loss[loss=0.1296, beats_loss=0.008016, ecapa_loss=0.0003961, whisper_loss=0.1177, over 14094.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01252, ecapa_loss=0.0003161, whisper_loss=0.09938, over 3851218.66 frames. ], batch size: 56, lr: 2.30e-02, grad_scale: 262144.0 2024-08-10 00:02:42,191 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 00:02:44,158 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=15.0 2024-08-10 00:02:45,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=265480.0, ans=0.125 2024-08-10 00:02:47,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=265480.0, ans=0.125 2024-08-10 00:02:49,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=265480.0, ans=0.125 2024-08-10 00:02:53,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=265580.0, ans=0.0 2024-08-10 00:02:58,588 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-10 00:03:21,068 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 00:03:36,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=265880.0, ans=0.1 2024-08-10 00:03:47,860 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12100, loss[loss=0.1378, beats_loss=0.009391, ecapa_loss=0.0003583, whisper_loss=0.1248, over 23027.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01254, ecapa_loss=0.0003167, whisper_loss=0.09926, over 3861164.59 frames. ], batch size: 92, lr: 2.30e-02, grad_scale: 262144.0 2024-08-10 00:03:50,079 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.34 vs. limit=22.5 2024-08-10 00:03:50,668 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 3.134e+01 3.753e+01 4.563e+01 7.245e+01, threshold=7.507e+01, percent-clipped=1.0 2024-08-10 00:03:52,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=265980.0, ans=0.1 2024-08-10 00:03:53,697 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 00:04:09,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=266080.0, ans=0.1 2024-08-10 00:04:19,018 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 00:04:19,665 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=12.0 2024-08-10 00:04:27,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=266180.0, ans=0.2 2024-08-10 00:04:38,786 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2024-08-10 00:04:44,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=266380.0, ans=0.125 2024-08-10 00:04:47,674 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.91 vs. limit=15.0 2024-08-10 00:04:55,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=266380.0, ans=0.2 2024-08-10 00:04:56,754 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 00:04:57,867 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12150, loss[loss=0.1192, beats_loss=0.01237, ecapa_loss=0.0003034, whisper_loss=0.1038, over 21846.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01248, ecapa_loss=0.000318, whisper_loss=0.09984, over 3866579.79 frames. ], batch size: 89, lr: 2.30e-02, grad_scale: 262144.0 2024-08-10 00:04:58,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=266480.0, ans=0.1 2024-08-10 00:05:12,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=266580.0, ans=0.1 2024-08-10 00:05:12,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=266580.0, ans=0.125 2024-08-10 00:05:20,464 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 23 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-10 00:05:26,341 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 00:05:40,500 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 32 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 00:05:44,687 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 00:05:49,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=266780.0, ans=0.125 2024-08-10 00:06:08,205 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12200, loss[loss=0.1115, beats_loss=0.01437, ecapa_loss=0.0003216, whisper_loss=0.09387, over 20580.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01248, ecapa_loss=0.0003172, whisper_loss=0.1004, over 3878316.15 frames. ], batch size: 85, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:06:11,094 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.872e+01 3.325e+01 3.813e+01 6.794e+01, threshold=6.650e+01, percent-clipped=0.0 2024-08-10 00:06:23,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=267080.0, ans=0.125 2024-08-10 00:06:29,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=267080.0, ans=0.125 2024-08-10 00:06:30,916 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 00:06:33,523 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 00:06:48,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=267180.0, ans=0.2 2024-08-10 00:06:51,328 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.65 vs. limit=15.0 2024-08-10 00:07:03,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=267380.0, ans=0.125 2024-08-10 00:07:11,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=267380.0, ans=0.2 2024-08-10 00:07:12,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.31 vs. limit=15.0 2024-08-10 00:07:18,281 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12250, loss[loss=0.1143, beats_loss=0.0134, ecapa_loss=0.0002739, whisper_loss=0.09817, over 13947.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01234, ecapa_loss=0.0003169, whisper_loss=0.1012, over 3883103.16 frames. ], batch size: 55, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:07:52,501 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2024-08-10 00:07:56,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.71 vs. limit=22.5 2024-08-10 00:08:07,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=267780.0, ans=0.05 2024-08-10 00:08:13,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=267880.0, ans=0.95 2024-08-10 00:08:27,402 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12300, loss[loss=0.1207, beats_loss=0.01384, ecapa_loss=0.0003064, whisper_loss=0.1038, over 22534.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01246, ecapa_loss=0.0003163, whisper_loss=0.1001, over 3881553.92 frames. ], batch size: 93, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:08:30,248 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.986e+01 3.586e+01 4.164e+01 6.809e+01, threshold=7.172e+01, percent-clipped=1.0 2024-08-10 00:08:33,723 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2024-08-10 00:08:38,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267980.0, ans=0.1 2024-08-10 00:08:39,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=268080.0, ans=0.125 2024-08-10 00:08:48,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268080.0, ans=0.1 2024-08-10 00:08:53,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=268080.0, ans=0.0 2024-08-10 00:08:55,238 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 00:08:55,971 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=15.0 2024-08-10 00:09:06,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=268180.0, ans=0.125 2024-08-10 00:09:08,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=268280.0, ans=0.0 2024-08-10 00:09:20,851 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 25 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-10 00:09:32,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268380.0, ans=0.1 2024-08-10 00:09:36,213 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12350, loss[loss=0.1279, beats_loss=0.01309, ecapa_loss=0.0003831, whisper_loss=0.1109, over 18634.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01246, ecapa_loss=0.0003191, whisper_loss=0.1007, over 3863817.18 frames. ], batch size: 75, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:09:39,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=268480.0, ans=0.0 2024-08-10 00:10:01,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=268580.0, ans=0.2 2024-08-10 00:10:37,284 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 00:10:43,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=268880.0, ans=0.125 2024-08-10 00:10:45,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=268880.0, ans=0.125 2024-08-10 00:10:48,254 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12400, loss[loss=0.1072, beats_loss=0.01394, ecapa_loss=0.0002459, whisper_loss=0.09083, over 17800.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01255, ecapa_loss=0.0003177, whisper_loss=0.09926, over 3839909.07 frames. ], batch size: 69, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:10:50,953 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.997e+01 3.426e+01 4.143e+01 8.992e+01, threshold=6.852e+01, percent-clipped=1.0 2024-08-10 00:10:53,983 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-10 00:10:54,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=268980.0, ans=0.2 2024-08-10 00:10:59,952 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2024-08-10 00:11:10,255 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.523e-01 2024-08-10 00:11:11,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=269080.0, ans=0.0 2024-08-10 00:11:14,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=269180.0, ans=0.0 2024-08-10 00:11:17,223 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 00:11:17,798 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.92 vs. limit=22.5 2024-08-10 00:11:21,279 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 16 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 00:11:28,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269280.0, ans=0.1 2024-08-10 00:11:33,334 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2024-08-10 00:11:58,117 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12450, loss[loss=0.1194, beats_loss=0.01261, ecapa_loss=0.0002895, whisper_loss=0.1039, over 18079.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01255, ecapa_loss=0.0003167, whisper_loss=0.09899, over 3856427.95 frames. ], batch size: 68, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:12:05,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=269480.0, ans=0.125 2024-08-10 00:12:20,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=269580.0, ans=0.2 2024-08-10 00:12:25,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=269680.0, ans=0.0 2024-08-10 00:12:57,249 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-10 00:13:08,253 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12500, loss[loss=0.1294, beats_loss=0.0115, ecapa_loss=0.0003554, whisper_loss=0.1143, over 21330.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01252, ecapa_loss=0.0003135, whisper_loss=0.09914, over 3863583.26 frames. ], batch size: 86, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:13:11,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 3.015e+01 3.443e+01 4.080e+01 3.263e+02, threshold=6.886e+01, percent-clipped=2.0 2024-08-10 00:13:13,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=269980.0, ans=0.125 2024-08-10 00:13:15,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=269980.0, ans=0.125 2024-08-10 00:13:22,838 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 26 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 00:13:29,248 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 00:13:39,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=270180.0, ans=0.0 2024-08-10 00:13:39,636 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-10 00:13:40,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=270180.0, ans=0.125 2024-08-10 00:13:41,833 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 00:13:46,339 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2024-08-10 00:13:55,403 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 00:13:55,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=270280.0, ans=0.0 2024-08-10 00:14:08,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=270380.0, ans=0.2 2024-08-10 00:14:17,272 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12550, loss[loss=0.1054, beats_loss=0.01315, ecapa_loss=0.0003382, whisper_loss=0.08887, over 21038.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01254, ecapa_loss=0.0003127, whisper_loss=0.09934, over 3861059.42 frames. ], batch size: 88, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:14:17,506 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 00:14:59,464 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 00:15:12,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=270880.0, ans=0.0 2024-08-10 00:15:26,950 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.49 vs. limit=15.0 2024-08-10 00:15:27,527 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12600, loss[loss=0.113, beats_loss=0.01372, ecapa_loss=0.0003147, whisper_loss=0.09613, over 19821.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01266, ecapa_loss=0.0003098, whisper_loss=0.09963, over 3870640.47 frames. ], batch size: 80, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:15:29,775 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=12.0 2024-08-10 00:15:30,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+01 3.077e+01 3.630e+01 3.984e+01 7.187e+01, threshold=7.260e+01, percent-clipped=1.0 2024-08-10 00:15:33,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=270980.0, ans=0.07 2024-08-10 00:15:39,155 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-10 00:15:54,555 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 00:16:00,883 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2024-08-10 00:16:01,752 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 00:16:17,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.29 vs. limit=10.0 2024-08-10 00:16:27,287 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.65 vs. limit=15.0 2024-08-10 00:16:32,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=271380.0, ans=0.125 2024-08-10 00:16:34,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=271380.0, ans=0.0 2024-08-10 00:16:37,748 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12650, loss[loss=0.1301, beats_loss=0.01067, ecapa_loss=0.0003008, whisper_loss=0.1164, over 22643.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01271, ecapa_loss=0.0003123, whisper_loss=0.09929, over 3855300.62 frames. ], batch size: 87, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:16:47,811 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 00:16:55,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=271580.0, ans=0.0 2024-08-10 00:16:55,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.29 vs. limit=10.0 2024-08-10 00:17:11,499 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-10 00:17:15,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=271680.0, ans=0.0 2024-08-10 00:17:26,347 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-08-10 00:17:35,813 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.11 vs. limit=22.5 2024-08-10 00:17:43,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=271880.0, ans=0.0 2024-08-10 00:17:45,194 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-10 00:17:46,363 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 00:17:47,608 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12700, loss[loss=0.09388, beats_loss=0.01267, ecapa_loss=0.0002908, whisper_loss=0.0783, over 18489.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01273, ecapa_loss=0.000312, whisper_loss=0.09917, over 3839651.63 frames. ], batch size: 72, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:17:50,115 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 3.012e+01 3.366e+01 3.844e+01 6.101e+01, threshold=6.733e+01, percent-clipped=0.0 2024-08-10 00:17:54,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=271980.0, ans=0.1 2024-08-10 00:18:21,149 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 00:18:22,041 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.37 vs. limit=15.0 2024-08-10 00:18:26,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=272180.0, ans=0.125 2024-08-10 00:18:28,020 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 00:18:57,472 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12750, loss[loss=0.09065, beats_loss=0.01221, ecapa_loss=0.0003568, whisper_loss=0.07486, over 14230.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01276, ecapa_loss=0.0003111, whisper_loss=0.09959, over 3854791.63 frames. ], batch size: 54, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:19:13,456 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 00:19:22,521 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.586e-01 2024-08-10 00:19:27,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=272680.0, ans=0.0 2024-08-10 00:19:32,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=272680.0, ans=0.0 2024-08-10 00:19:52,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=272880.0, ans=0.1 2024-08-10 00:19:57,046 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=12.0 2024-08-10 00:20:04,008 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 00:20:05,925 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2024-08-10 00:20:07,767 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12800, loss[loss=0.1118, beats_loss=0.009484, ecapa_loss=0.0002627, whisper_loss=0.09966, over 15416.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01274, ecapa_loss=0.0003145, whisper_loss=0.09958, over 3845283.18 frames. ], batch size: 54, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:20:10,371 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.474e+01 2.990e+01 3.546e+01 4.142e+01 8.927e+01, threshold=7.091e+01, percent-clipped=1.0 2024-08-10 00:20:16,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=272980.0, ans=0.0 2024-08-10 00:20:17,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=272980.0, ans=0.125 2024-08-10 00:20:29,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=273080.0, ans=0.2 2024-08-10 00:20:37,278 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 00:21:17,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=273480.0, ans=0.0 2024-08-10 00:21:18,433 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12850, loss[loss=0.1219, beats_loss=0.01229, ecapa_loss=0.0002941, whisper_loss=0.1067, over 18084.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01272, ecapa_loss=0.0003142, whisper_loss=0.09869, over 3806580.33 frames. ], batch size: 71, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:21:24,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=273480.0, ans=0.05 2024-08-10 00:21:29,033 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.48 vs. limit=10.0 2024-08-10 00:21:51,408 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2024-08-10 00:21:53,604 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 14 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-10 00:21:54,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=273680.0, ans=0.0 2024-08-10 00:22:06,102 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 00:22:18,291 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.64 vs. limit=22.5 2024-08-10 00:22:21,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=273880.0, ans=0.1 2024-08-10 00:22:22,008 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 00:22:23,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=273880.0, ans=0.125 2024-08-10 00:22:28,294 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12900, loss[loss=0.0931, beats_loss=0.01553, ecapa_loss=0.0002189, whisper_loss=0.07538, over 17681.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01271, ecapa_loss=0.000314, whisper_loss=0.09831, over 3791248.89 frames. ], batch size: 70, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:22:31,163 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 3.013e+01 3.364e+01 3.931e+01 6.029e+01, threshold=6.729e+01, percent-clipped=0.0 2024-08-10 00:22:40,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=273980.0, ans=0.0 2024-08-10 00:22:42,128 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 00:23:22,070 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 00:23:29,181 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 00:23:31,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=274380.0, ans=0.125 2024-08-10 00:23:35,116 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=5.112e-03 2024-08-10 00:23:35,575 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.18 vs. limit=15.0 2024-08-10 00:23:40,147 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 12950, loss[loss=0.1318, beats_loss=0.008688, ecapa_loss=0.0003837, whisper_loss=0.1193, over 19552.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01265, ecapa_loss=0.0003108, whisper_loss=0.09866, over 3824404.97 frames. ], batch size: 76, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:23:57,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=274580.0, ans=0.125 2024-08-10 00:24:07,460 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-10 00:24:09,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=274680.0, ans=0.0 2024-08-10 00:24:27,359 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2024-08-10 00:24:29,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=274780.0, ans=0.125 2024-08-10 00:24:33,718 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 00:24:50,655 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13000, loss[loss=0.09356, beats_loss=0.01622, ecapa_loss=0.0002614, whisper_loss=0.07472, over 19086.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01263, ecapa_loss=0.000311, whisper_loss=0.09888, over 3838728.47 frames. ], batch size: 80, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:24:50,890 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-10 00:24:53,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.907e+01 3.154e+01 3.704e+01 5.779e+01, threshold=6.309e+01, percent-clipped=0.0 2024-08-10 00:25:20,261 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-10 00:25:35,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=275280.0, ans=0.125 2024-08-10 00:26:01,091 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13050, loss[loss=0.136, beats_loss=0.01132, ecapa_loss=0.0003468, whisper_loss=0.1212, over 17899.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01264, ecapa_loss=0.0003101, whisper_loss=0.09901, over 3824096.72 frames. ], batch size: 70, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:26:27,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=275580.0, ans=0.125 2024-08-10 00:26:53,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=275780.0, ans=0.2 2024-08-10 00:26:53,950 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2024-08-10 00:26:59,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=275880.0, ans=0.0 2024-08-10 00:26:59,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=275880.0, ans=0.125 2024-08-10 00:27:08,952 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-08-10 00:27:12,092 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13100, loss[loss=0.1143, beats_loss=0.009886, ecapa_loss=0.0003647, whisper_loss=0.1008, over 15112.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01264, ecapa_loss=0.0003086, whisper_loss=0.09927, over 3842146.71 frames. ], batch size: 64, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:27:14,977 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.977e+01 3.328e+01 3.884e+01 7.929e+01, threshold=6.656e+01, percent-clipped=3.0 2024-08-10 00:27:21,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=275980.0, ans=0.0 2024-08-10 00:27:21,917 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=12.0 2024-08-10 00:27:23,877 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-10 00:27:37,599 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 00:27:39,550 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.219e+01 2024-08-10 00:27:46,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=276180.0, ans=0.2 2024-08-10 00:27:47,877 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-10 00:27:52,058 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 00:27:53,533 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 00:27:56,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=276280.0, ans=0.125 2024-08-10 00:28:05,037 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 00:28:11,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=276380.0, ans=0.125 2024-08-10 00:28:11,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=276380.0, ans=0.0 2024-08-10 00:28:23,385 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13150, loss[loss=0.1089, beats_loss=0.01255, ecapa_loss=0.0003202, whisper_loss=0.09319, over 18492.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01268, ecapa_loss=0.0003089, whisper_loss=0.09899, over 3835905.80 frames. ], batch size: 75, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:28:27,618 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.19 vs. limit=15.0 2024-08-10 00:28:27,680 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.16 vs. limit=10.0 2024-08-10 00:28:44,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=276580.0, ans=0.125 2024-08-10 00:28:51,760 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 00:28:51,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=276680.0, ans=0.125 2024-08-10 00:28:57,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=276680.0, ans=0.2 2024-08-10 00:29:03,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=276680.0, ans=0.0 2024-08-10 00:29:10,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=276780.0, ans=0.1 2024-08-10 00:29:20,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=276880.0, ans=0.2 2024-08-10 00:29:31,622 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2024-08-10 00:29:33,299 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13200, loss[loss=0.1154, beats_loss=0.01372, ecapa_loss=0.0002608, whisper_loss=0.09911, over 19195.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01268, ecapa_loss=0.0003076, whisper_loss=0.09889, over 3863325.21 frames. ], batch size: 73, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:29:33,874 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.440e+00 2024-08-10 00:29:33,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=276980.0, ans=0.0 2024-08-10 00:29:36,045 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.260e+01 3.048e+01 3.557e+01 4.616e+01 6.724e+01, threshold=7.115e+01, percent-clipped=1.0 2024-08-10 00:29:36,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=276980.0, ans=0.0 2024-08-10 00:29:39,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=276980.0, ans=0.0 2024-08-10 00:29:57,832 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 00:30:19,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.74 vs. limit=10.0 2024-08-10 00:30:38,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=277380.0, ans=0.125 2024-08-10 00:30:39,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=277380.0, ans=0.125 2024-08-10 00:30:43,174 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13250, loss[loss=0.109, beats_loss=0.01389, ecapa_loss=0.0003298, whisper_loss=0.09177, over 20793.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01249, ecapa_loss=0.0003102, whisper_loss=0.09958, over 3859257.18 frames. ], batch size: 85, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:30:46,849 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.37 vs. limit=22.5 2024-08-10 00:31:03,669 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2024-08-10 00:31:12,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=277680.0, ans=0.2 2024-08-10 00:31:12,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=277680.0, ans=0.125 2024-08-10 00:31:15,439 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 00:31:16,608 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 00:31:16,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=277680.0, ans=0.05 2024-08-10 00:31:19,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=277680.0, ans=0.125 2024-08-10 00:31:28,584 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=15.0 2024-08-10 00:31:45,056 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.75 vs. limit=12.0 2024-08-10 00:31:45,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=277880.0, ans=0.0 2024-08-10 00:31:53,900 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-10 00:31:56,261 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13300, loss[loss=0.1117, beats_loss=0.01215, ecapa_loss=0.0003056, whisper_loss=0.09646, over 22882.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01249, ecapa_loss=0.0003104, whisper_loss=0.1, over 3861009.58 frames. ], batch size: 92, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:31:58,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=277980.0, ans=0.125 2024-08-10 00:31:59,759 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.409e+01 2.953e+01 3.236e+01 3.823e+01 6.068e+01, threshold=6.472e+01, percent-clipped=0.0 2024-08-10 00:32:26,811 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 00:32:29,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=278180.0, ans=0.0 2024-08-10 00:32:41,065 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 00:32:41,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=278180.0, ans=0.1 2024-08-10 00:32:46,125 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2024-08-10 00:33:03,658 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 00:33:14,285 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13350, loss[loss=0.1245, beats_loss=0.01313, ecapa_loss=0.0003242, whisper_loss=0.1081, over 19423.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01249, ecapa_loss=0.0003125, whisper_loss=0.1004, over 3871074.30 frames. ], batch size: 80, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:33:34,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=278580.0, ans=0.0 2024-08-10 00:33:46,483 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-10 00:33:58,943 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-10 00:33:59,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=278780.0, ans=0.125 2024-08-10 00:34:21,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=278880.0, ans=0.1 2024-08-10 00:34:24,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=278880.0, ans=0.0 2024-08-10 00:34:26,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=278880.0, ans=0.125 2024-08-10 00:34:31,838 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13400, loss[loss=0.1076, beats_loss=0.01498, ecapa_loss=0.0003427, whisper_loss=0.08917, over 21118.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01256, ecapa_loss=0.0003136, whisper_loss=0.09982, over 3875438.33 frames. ], batch size: 90, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:34:34,745 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.868e+01 3.242e+01 3.595e+01 7.666e+01, threshold=6.483e+01, percent-clipped=2.0 2024-08-10 00:34:52,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=279080.0, ans=0.0 2024-08-10 00:34:53,624 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-10 00:35:01,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279180.0, ans=0.1 2024-08-10 00:35:01,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279180.0, ans=0.1 2024-08-10 00:35:04,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=279180.0, ans=0.2 2024-08-10 00:35:14,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=279180.0, ans=0.125 2024-08-10 00:35:17,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=279280.0, ans=0.0 2024-08-10 00:35:18,572 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 00:35:29,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=279280.0, ans=0.125 2024-08-10 00:35:33,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=279380.0, ans=0.125 2024-08-10 00:35:48,255 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13450, loss[loss=0.1311, beats_loss=0.009504, ecapa_loss=0.0003074, whisper_loss=0.1185, over 17501.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01252, ecapa_loss=0.0003119, whisper_loss=0.1009, over 3887175.07 frames. ], batch size: 66, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:36:58,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=279880.0, ans=0.0 2024-08-10 00:37:06,881 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13500, loss[loss=0.1171, beats_loss=0.01367, ecapa_loss=0.0003666, whisper_loss=0.0998, over 20408.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01247, ecapa_loss=0.0003128, whisper_loss=0.1009, over 3901494.31 frames. ], batch size: 87, lr: 2.24e-02, grad_scale: 262144.0 2024-08-10 00:37:13,008 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 3.053e+01 3.516e+01 4.040e+01 7.643e+01, threshold=7.031e+01, percent-clipped=3.0 2024-08-10 00:37:13,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=279980.0, ans=0.125 2024-08-10 00:37:28,731 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.645e+03 2024-08-10 00:37:34,610 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 00:37:42,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=280180.0, ans=0.125 2024-08-10 00:38:09,726 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 19 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 00:38:09,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=280380.0, ans=0.0 2024-08-10 00:38:19,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=280380.0, ans=0.0 2024-08-10 00:38:22,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=280380.0, ans=0.05 2024-08-10 00:38:24,593 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13550, loss[loss=0.1042, beats_loss=0.01077, ecapa_loss=0.0003775, whisper_loss=0.0897, over 14297.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01246, ecapa_loss=0.0003137, whisper_loss=0.1013, over 3898230.78 frames. ], batch size: 58, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:38:27,383 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.92 vs. limit=22.5 2024-08-10 00:38:44,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-10 00:38:57,503 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 00:39:15,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=280780.0, ans=0.0 2024-08-10 00:39:19,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.45 vs. limit=15.0 2024-08-10 00:39:26,426 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 00:39:38,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=280880.0, ans=0.125 2024-08-10 00:39:41,694 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13600, loss[loss=0.09344, beats_loss=0.01368, ecapa_loss=0.0003489, whisper_loss=0.07627, over 16665.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01258, ecapa_loss=0.0003123, whisper_loss=0.1004, over 3881007.81 frames. ], batch size: 73, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:39:42,236 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.954e-01 2024-08-10 00:39:42,554 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=15.0 2024-08-10 00:39:44,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.214e+01 2.967e+01 3.461e+01 3.946e+01 7.975e+01, threshold=6.923e+01, percent-clipped=1.0 2024-08-10 00:40:04,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281080.0, ans=0.1 2024-08-10 00:40:07,110 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 00:40:13,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=281180.0, ans=0.0 2024-08-10 00:40:23,922 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2024-08-10 00:40:24,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=281180.0, ans=0.0 2024-08-10 00:40:25,092 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-10 00:40:32,144 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-10 00:40:40,401 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 00:41:00,932 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13650, loss[loss=0.1262, beats_loss=0.01191, ecapa_loss=0.0002991, whisper_loss=0.1113, over 23505.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01256, ecapa_loss=0.0003135, whisper_loss=0.1002, over 3892800.20 frames. ], batch size: 93, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:42:00,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=281780.0, ans=0.2 2024-08-10 00:42:04,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=281780.0, ans=0.2 2024-08-10 00:42:04,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=281780.0, ans=0.025 2024-08-10 00:42:04,637 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.70 vs. limit=15.0 2024-08-10 00:42:22,446 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13700, loss[loss=0.1211, beats_loss=0.01141, ecapa_loss=0.0003203, whisper_loss=0.1065, over 14095.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01253, ecapa_loss=0.0003117, whisper_loss=0.1005, over 3894499.21 frames. ], batch size: 56, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:42:25,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+01 2.951e+01 3.261e+01 3.919e+01 6.807e+01, threshold=6.522e+01, percent-clipped=0.0 2024-08-10 00:42:26,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=281980.0, ans=0.0 2024-08-10 00:42:33,268 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 00:42:33,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=281980.0, ans=0.0 2024-08-10 00:42:36,837 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 00:42:47,100 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2024-08-10 00:42:54,645 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-10 00:43:37,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=282380.0, ans=0.125 2024-08-10 00:43:37,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=282380.0, ans=0.0 2024-08-10 00:43:38,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=282380.0, ans=0.2 2024-08-10 00:43:44,150 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13750, loss[loss=0.1245, beats_loss=0.01248, ecapa_loss=0.0003063, whisper_loss=0.109, over 16385.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01251, ecapa_loss=0.000313, whisper_loss=0.1003, over 3903548.87 frames. ], batch size: 63, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:43:54,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=282480.0, ans=0.125 2024-08-10 00:44:01,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=282580.0, ans=0.125 2024-08-10 00:44:19,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=282680.0, ans=0.125 2024-08-10 00:44:21,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=282680.0, ans=0.1 2024-08-10 00:44:27,443 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=15.0 2024-08-10 00:44:36,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=282780.0, ans=0.125 2024-08-10 00:44:41,286 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.24 vs. limit=15.0 2024-08-10 00:44:44,233 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.19 vs. limit=10.0 2024-08-10 00:44:51,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=282880.0, ans=0.0 2024-08-10 00:44:59,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=282880.0, ans=0.1 2024-08-10 00:45:02,120 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13800, loss[loss=0.1148, beats_loss=0.007723, ecapa_loss=0.0004174, whisper_loss=0.1029, over 14642.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01238, ecapa_loss=0.0003143, whisper_loss=0.1004, over 3877082.80 frames. ], batch size: 60, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:45:06,388 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.944e+01 3.294e+01 3.829e+01 5.391e+01, threshold=6.589e+01, percent-clipped=0.0 2024-08-10 00:45:34,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=283080.0, ans=0.0 2024-08-10 00:45:35,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=283080.0, ans=10.0 2024-08-10 00:45:44,082 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.015e+01 2024-08-10 00:45:58,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=283280.0, ans=0.07 2024-08-10 00:46:00,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=283280.0, ans=0.2 2024-08-10 00:46:01,523 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 00:46:06,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=283280.0, ans=0.0 2024-08-10 00:46:25,707 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13850, loss[loss=0.1222, beats_loss=0.01298, ecapa_loss=0.000346, whisper_loss=0.1057, over 19399.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01239, ecapa_loss=0.0003144, whisper_loss=0.1005, over 3891164.10 frames. ], batch size: 79, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:46:26,558 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.28 vs. limit=10.0 2024-08-10 00:46:31,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283480.0, ans=0.1 2024-08-10 00:46:36,332 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.92 vs. limit=10.0 2024-08-10 00:46:40,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-08-10 00:46:52,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=283580.0, ans=6.0 2024-08-10 00:46:57,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=283680.0, ans=0.125 2024-08-10 00:46:58,090 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 00:46:58,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=283680.0, ans=0.0 2024-08-10 00:47:09,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=283680.0, ans=0.125 2024-08-10 00:47:13,886 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 00:47:22,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=283780.0, ans=0.125 2024-08-10 00:47:25,573 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 00:47:28,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=283780.0, ans=0.125 2024-08-10 00:47:28,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=283780.0, ans=0.125 2024-08-10 00:47:30,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=283880.0, ans=0.05 2024-08-10 00:47:39,829 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 00:47:47,269 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13900, loss[loss=0.1068, beats_loss=0.01516, ecapa_loss=0.0003107, whisper_loss=0.08849, over 21681.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01245, ecapa_loss=0.0003143, whisper_loss=0.09954, over 3842977.37 frames. ], batch size: 90, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:47:50,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.945e+01 3.348e+01 3.878e+01 5.863e+01, threshold=6.696e+01, percent-clipped=0.0 2024-08-10 00:48:05,019 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-10 00:48:11,634 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 00:48:19,345 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.098e-02 2024-08-10 00:48:25,946 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2024-08-10 00:48:33,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=284180.0, ans=0.2 2024-08-10 00:49:01,442 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 00:49:09,849 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 13950, loss[loss=0.1433, beats_loss=0.009812, ecapa_loss=0.0003102, whisper_loss=0.1304, over 21058.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.0125, ecapa_loss=0.0003142, whisper_loss=0.0998, over 3882842.63 frames. ], batch size: 81, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:49:12,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=284480.0, ans=0.0 2024-08-10 00:49:17,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=284480.0, ans=0.0 2024-08-10 00:49:32,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=284580.0, ans=0.1 2024-08-10 00:49:33,741 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 00:49:41,609 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-10 00:49:49,135 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.97 vs. limit=10.0 2024-08-10 00:49:56,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=284680.0, ans=10.0 2024-08-10 00:50:09,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=284780.0, ans=0.125 2024-08-10 00:50:09,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=284780.0, ans=0.1 2024-08-10 00:50:33,120 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 14000, loss[loss=0.1313, beats_loss=0.01232, ecapa_loss=0.000297, whisper_loss=0.116, over 20776.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01248, ecapa_loss=0.000311, whisper_loss=0.1006, over 3921793.51 frames. ], batch size: 80, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:50:35,932 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.957e+01 3.357e+01 3.952e+01 6.248e+01, threshold=6.715e+01, percent-clipped=0.0 2024-08-10 00:50:38,376 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 00:50:38,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=284980.0, ans=0.125 2024-08-10 00:50:38,688 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.007e-01 2024-08-10 00:51:01,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=285080.0, ans=0.0 2024-08-10 00:51:10,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=285180.0, ans=0.0 2024-08-10 00:51:54,288 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 14050, loss[loss=0.1233, beats_loss=0.01341, ecapa_loss=0.0003268, whisper_loss=0.1067, over 15135.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01254, ecapa_loss=0.0003111, whisper_loss=0.101, over 3930646.99 frames. ], batch size: 59, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:52:07,575 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=15.0 2024-08-10 00:52:09,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=285580.0, ans=0.0 2024-08-10 00:52:11,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=285580.0, ans=0.125 2024-08-10 00:52:15,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=285580.0, ans=0.1 2024-08-10 00:52:29,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=285680.0, ans=0.2 2024-08-10 00:52:32,160 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=12.0 2024-08-10 00:52:46,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=285780.0, ans=0.0 2024-08-10 00:53:00,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=285880.0, ans=0.1 2024-08-10 00:53:14,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=285980.0, ans=0.125 2024-08-10 00:53:15,418 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 14100, loss[loss=0.1216, beats_loss=0.01143, ecapa_loss=0.0002956, whisper_loss=0.1073, over 15844.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01257, ecapa_loss=0.000311, whisper_loss=0.1012, over 3924623.79 frames. ], batch size: 63, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:53:18,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 2.998e+01 3.654e+01 4.043e+01 1.341e+02, threshold=7.307e+01, percent-clipped=1.0 2024-08-10 00:53:18,914 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-10 00:53:40,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=286080.0, ans=0.0 2024-08-10 00:53:40,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=286080.0, ans=0.125 2024-08-10 00:53:44,514 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.480e-01 2024-08-10 00:53:47,193 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 00:53:48,415 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-10 00:54:12,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=15.0 2024-08-10 00:54:18,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=286280.0, ans=0.125 2024-08-10 00:54:25,473 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 00:54:27,325 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 00:54:35,514 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 14150, loss[loss=0.08697, beats_loss=0.01299, ecapa_loss=0.0002586, whisper_loss=0.0714, over 14012.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01258, ecapa_loss=0.0003123, whisper_loss=0.1013, over 3902174.47 frames. ], batch size: 56, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:54:36,445 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.43 vs. limit=22.5 2024-08-10 00:54:39,134 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 00:54:41,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=286480.0, ans=0.125 2024-08-10 00:54:44,730 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 00:55:02,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=286580.0, ans=0.125 2024-08-10 00:55:04,274 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2024-08-10 00:55:13,922 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 00:55:16,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=286680.0, ans=0.125 2024-08-10 00:55:21,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=286680.0, ans=0.2 2024-08-10 00:55:32,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=286780.0, ans=0.2 2024-08-10 00:55:40,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=12.0 2024-08-10 00:55:46,141 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-10 00:55:53,343 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 14200, loss[loss=0.0934, beats_loss=0.01323, ecapa_loss=0.0003239, whisper_loss=0.07694, over 20642.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01253, ecapa_loss=0.0003115, whisper_loss=0.1008, over 3913812.73 frames. ], batch size: 83, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:55:58,022 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 3.000e+01 3.388e+01 3.894e+01 5.742e+01, threshold=6.776e+01, percent-clipped=0.0 2024-08-10 00:56:39,189 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 00:56:46,000 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 00:56:56,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=287280.0, ans=0.125 2024-08-10 00:57:02,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.54 vs. limit=22.5 2024-08-10 00:57:17,233 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.05 vs. limit=10.0 2024-08-10 00:57:21,275 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 00:57:38,513 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 14250, loss[loss=0.1143, beats_loss=0.01369, ecapa_loss=0.0002971, whisper_loss=0.09765, over 18574.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01248, ecapa_loss=0.0003113, whisper_loss=0.1013, over 3924300.04 frames. ], batch size: 74, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:57:53,520 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2024-08-10 00:57:58,236 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 17 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 00:58:02,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=287580.0, ans=0.0 2024-08-10 00:58:11,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-08-10 00:58:14,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=287680.0, ans=0.125 2024-08-10 00:58:16,220 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-10 00:58:31,829 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 00:58:33,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=287780.0, ans=0.2 2024-08-10 00:58:40,429 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 00:59:14,159 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 14300, loss[loss=0.1242, beats_loss=0.01544, ecapa_loss=0.0002742, whisper_loss=0.1061, over 23798.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01258, ecapa_loss=0.0003068, whisper_loss=0.1004, over 3935070.92 frames. ], batch size: 96, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:59:19,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.453e+01 3.147e+01 3.620e+01 4.133e+01 1.421e+02, threshold=7.240e+01, percent-clipped=1.0 2024-08-10 00:59:28,466 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 10 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 00:59:50,059 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 01:00:00,995 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 34 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 01:00:22,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=288180.0, ans=0.09899494936611666 2024-08-10 01:01:01,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-08-10 01:01:12,134 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 14350, loss[loss=0.1216, beats_loss=0.009156, ecapa_loss=0.000373, whisper_loss=0.1087, over 15233.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01254, ecapa_loss=0.0003072, whisper_loss=0.1003, over 3906532.96 frames. ], batch size: 58, lr: 2.21e-02, grad_scale: 524288.0 2024-08-10 01:01:15,174 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 01:01:23,020 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 01:01:25,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=288480.0, ans=0.02 2024-08-10 01:01:26,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.66 vs. limit=22.5 2024-08-10 01:01:47,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=288580.0, ans=0.125 2024-08-10 01:02:36,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=288780.0, ans=0.125 2024-08-10 01:03:08,884 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 14400, loss[loss=0.1172, beats_loss=0.01304, ecapa_loss=0.0002927, whisper_loss=0.1013, over 22991.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01251, ecapa_loss=0.0003083, whisper_loss=0.1004, over 3901314.66 frames. ], batch size: 92, lr: 2.21e-02, grad_scale: 524288.0 2024-08-10 01:03:13,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.997e+01 3.365e+01 3.798e+01 7.821e+01, threshold=6.729e+01, percent-clipped=1.0 2024-08-10 01:03:19,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=288980.0, ans=22.5 2024-08-10 01:03:41,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=289080.0, ans=0.07 2024-08-10 01:03:42,559 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-10 01:03:52,852 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.185e+00 2024-08-10 01:04:10,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=289180.0, ans=0.07 2024-08-10 01:04:28,003 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 24 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-10 01:04:45,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 2, batch 14450, loss[loss=0.1074, beats_loss=0.01414, ecapa_loss=0.0003148, whisper_loss=0.09009, over 16817.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01252, ecapa_loss=0.000309, whisper_loss=0.1002, over 3852723.75 frames. ], batch size: 68, lr: 2.21e-02, grad_scale: 524288.0 2024-08-10 01:04:50,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=289480.0, ans=0.125 2024-08-10 01:05:02,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=289580.0, ans=0.1 2024-08-10 01:05:08,220 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.126e+03 2024-08-10 01:05:08,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=289580.0, ans=0.1 2024-08-10 01:05:16,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=289680.0, ans=0.0 2024-08-10 01:06:23,828 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 0, loss[loss=0.09658, beats_loss=0.01312, ecapa_loss=0.0003155, whisper_loss=0.08031, over 20702.00 frames. ], tot_loss[loss=0.09658, beats_loss=0.01312, ecapa_loss=0.0003155, whisper_loss=0.08031, over 20702.00 frames. ], batch size: 82, lr: 2.10e-02, grad_scale: 524288.0 2024-08-10 01:06:23,829 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 01:07:07,578 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on ASR_libri: loss=0.2782, beats_loss=0, ecapa_loss=0.0009143, whisper_loss=0.2691, over 922467.00 frames. 2024-08-10 01:07:23,481 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on SV_voxceleb1: loss=0.008083, beats_loss=0, ecapa_loss=0.0008083, whisper_loss=0, over 939242.00 frames. 2024-08-10 01:08:18,783 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7516, 1.9217, 1.6624, 1.5984, 1.2532, 1.7200, 2.6148, 1.1311], device='cuda:1') 2024-08-10 01:09:27,978 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on AT_audioset: loss=0.02889, beats_loss=0.02889, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 01:09:27,980 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 01:09:50,274 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 01:09:52,943 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-10 01:10:02,942 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 3.015e+01 3.420e+01 3.932e+01 5.377e+01, threshold=6.841e+01, percent-clipped=0.0 2024-08-10 01:10:20,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=289980.0, ans=0.04949747468305833 2024-08-10 01:10:33,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=290080.0, ans=0.2 2024-08-10 01:11:38,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=290280.0, ans=6.0 2024-08-10 01:11:40,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=290380.0, ans=0.125 2024-08-10 01:11:42,110 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 50, loss[loss=0.1281, beats_loss=0.01243, ecapa_loss=0.000389, whisper_loss=0.1118, over 21128.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01238, ecapa_loss=0.0003255, whisper_loss=0.09953, over 869087.77 frames. ], batch size: 89, lr: 2.10e-02, grad_scale: 524288.0 2024-08-10 01:11:45,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=290380.0, ans=0.125 2024-08-10 01:12:06,202 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 01:12:21,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=290480.0, ans=0.125 2024-08-10 01:12:53,319 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-10 01:13:01,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=290680.0, ans=0.0 2024-08-10 01:13:01,585 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-08-10 01:13:17,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=290680.0, ans=0.125 2024-08-10 01:13:22,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=290780.0, ans=0.125 2024-08-10 01:13:27,813 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 01:13:41,987 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2024-08-10 01:13:46,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2024-08-10 01:13:48,610 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 100, loss[loss=0.1046, beats_loss=0.01223, ecapa_loss=0.000293, whisper_loss=0.08944, over 21826.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01214, ecapa_loss=0.0003166, whisper_loss=0.1003, over 1549322.44 frames. ], batch size: 87, lr: 2.10e-02, grad_scale: 524288.0 2024-08-10 01:14:07,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=290880.0, ans=0.0 2024-08-10 01:14:18,743 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.624e+01 3.304e+01 3.835e+01 4.447e+01 6.801e+01, threshold=7.671e+01, percent-clipped=0.0 2024-08-10 01:14:52,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=291080.0, ans=0.125 2024-08-10 01:14:58,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=291080.0, ans=0.0 2024-08-10 01:15:21,497 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 01:15:23,626 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 01:15:31,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=291280.0, ans=0.2 2024-08-10 01:15:38,320 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 01:15:44,453 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 150, loss[loss=0.09983, beats_loss=0.01164, ecapa_loss=0.0002703, whisper_loss=0.08549, over 17643.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01223, ecapa_loss=0.0003088, whisper_loss=0.09987, over 2059364.99 frames. ], batch size: 68, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:15:45,380 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-10 01:16:11,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=291480.0, ans=0.1 2024-08-10 01:16:24,675 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 01:16:35,081 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 01:16:43,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=291680.0, ans=0.125 2024-08-10 01:16:46,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=291680.0, ans=0.1 2024-08-10 01:16:58,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=291780.0, ans=0.0 2024-08-10 01:17:02,417 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 01:17:11,590 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 200, loss[loss=0.145, beats_loss=0.009226, ecapa_loss=0.0003309, whisper_loss=0.1325, over 23253.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01224, ecapa_loss=0.0003044, whisper_loss=0.09959, over 2456540.07 frames. ], batch size: 91, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:17:17,379 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-10 01:17:19,949 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2024-08-10 01:17:29,268 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 01:17:31,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 3.029e+01 3.361e+01 3.912e+01 9.673e+01, threshold=6.721e+01, percent-clipped=1.0 2024-08-10 01:17:41,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=291980.0, ans=0.125 2024-08-10 01:17:44,501 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 01:17:53,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=292080.0, ans=0.0 2024-08-10 01:18:09,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=292180.0, ans=0.125 2024-08-10 01:18:13,730 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2024-08-10 01:18:18,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=292280.0, ans=0.125 2024-08-10 01:18:22,396 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-10 01:18:31,465 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 250, loss[loss=0.1123, beats_loss=0.01315, ecapa_loss=0.0002564, whisper_loss=0.09659, over 16891.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01232, ecapa_loss=0.0002985, whisper_loss=0.09871, over 2752073.53 frames. ], batch size: 65, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:18:44,881 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=12.0 2024-08-10 01:18:51,247 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 01:18:54,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=292480.0, ans=0.0 2024-08-10 01:19:16,281 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 01:19:16,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=292680.0, ans=0.125 2024-08-10 01:19:23,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=292680.0, ans=0.125 2024-08-10 01:19:25,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=292680.0, ans=0.0 2024-08-10 01:19:31,717 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2024-08-10 01:19:35,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=292780.0, ans=0.0 2024-08-10 01:19:45,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=292780.0, ans=0.0 2024-08-10 01:19:46,648 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 01:19:47,728 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 300, loss[loss=0.1113, beats_loss=0.01302, ecapa_loss=0.0002381, whisper_loss=0.09588, over 22354.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01233, ecapa_loss=0.0002977, whisper_loss=0.09934, over 2990396.94 frames. ], batch size: 88, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:19:59,471 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 01:19:59,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=292880.0, ans=0.2 2024-08-10 01:20:05,427 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 01:20:06,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 3.157e+01 3.521e+01 4.168e+01 6.266e+01, threshold=7.043e+01, percent-clipped=0.0 2024-08-10 01:20:11,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=292980.0, ans=0.125 2024-08-10 01:20:11,544 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2024-08-10 01:20:13,940 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 01:20:14,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=292980.0, ans=0.125 2024-08-10 01:20:14,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=292980.0, ans=0.0 2024-08-10 01:20:32,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=293080.0, ans=0.0 2024-08-10 01:20:32,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=293080.0, ans=0.1 2024-08-10 01:20:34,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=293180.0, ans=0.125 2024-08-10 01:20:50,719 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-10 01:20:58,617 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 01:20:59,368 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=12.0 2024-08-10 01:21:00,839 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.96 vs. limit=15.0 2024-08-10 01:21:06,590 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 350, loss[loss=0.1196, beats_loss=0.01315, ecapa_loss=0.0002432, whisper_loss=0.104, over 15093.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01235, ecapa_loss=0.0002937, whisper_loss=0.09874, over 3159499.89 frames. ], batch size: 56, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:21:12,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=293380.0, ans=0.0 2024-08-10 01:21:32,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=293480.0, ans=0.125 2024-08-10 01:21:32,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=293480.0, ans=0.125 2024-08-10 01:21:33,304 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.227e+00 2024-08-10 01:21:43,046 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 01:22:01,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=293680.0, ans=0.0 2024-08-10 01:22:07,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=293780.0, ans=0.0 2024-08-10 01:22:08,216 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 29 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 01:22:21,719 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 400, loss[loss=0.08457, beats_loss=0.01237, ecapa_loss=0.0002349, whisper_loss=0.06985, over 15656.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01228, ecapa_loss=0.0002943, whisper_loss=0.1001, over 3340272.93 frames. ], batch size: 58, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:22:39,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 2.898e+01 3.177e+01 4.000e+01 8.293e+01, threshold=6.353e+01, percent-clipped=1.0 2024-08-10 01:22:41,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=293980.0, ans=0.0 2024-08-10 01:22:41,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=293980.0, ans=0.0 2024-08-10 01:22:48,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=293980.0, ans=0.125 2024-08-10 01:22:53,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=294080.0, ans=0.0 2024-08-10 01:22:56,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=294080.0, ans=0.0 2024-08-10 01:23:00,710 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 01:23:09,825 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 01:23:20,628 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 01:23:22,348 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-10 01:23:37,227 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 450, loss[loss=0.1255, beats_loss=0.01139, ecapa_loss=0.0002798, whisper_loss=0.1113, over 17572.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01233, ecapa_loss=0.000293, whisper_loss=0.09931, over 3433047.85 frames. ], batch size: 68, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:23:39,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=294380.0, ans=0.0 2024-08-10 01:23:53,370 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 01:23:59,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=294480.0, ans=0.125 2024-08-10 01:24:02,291 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 01:24:07,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=294580.0, ans=0.125 2024-08-10 01:24:29,006 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2024-08-10 01:24:31,476 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.16 vs. limit=15.0 2024-08-10 01:24:41,229 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 01:24:48,094 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=12.0 2024-08-10 01:24:52,132 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 500, loss[loss=0.1224, beats_loss=0.01459, ecapa_loss=0.0002406, whisper_loss=0.1054, over 18321.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01222, ecapa_loss=0.0002905, whisper_loss=0.09994, over 3520953.84 frames. ], batch size: 73, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:24:52,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=294880.0, ans=0.1 2024-08-10 01:24:54,868 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-10 01:24:59,167 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-10 01:25:04,302 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.39 vs. limit=22.5 2024-08-10 01:25:09,594 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+01 2.966e+01 3.370e+01 3.826e+01 6.580e+01, threshold=6.739e+01, percent-clipped=1.0 2024-08-10 01:25:41,637 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 01:25:44,189 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 32 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 01:25:55,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=295280.0, ans=0.0 2024-08-10 01:26:05,249 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 550, loss[loss=0.1069, beats_loss=0.0118, ecapa_loss=0.0003108, whisper_loss=0.09204, over 20088.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01232, ecapa_loss=0.0002877, whisper_loss=0.09929, over 3605521.66 frames. ], batch size: 78, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:26:18,395 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 01:26:33,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=295480.0, ans=0.125 2024-08-10 01:26:36,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=295580.0, ans=0.0 2024-08-10 01:26:40,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=295580.0, ans=0.0 2024-08-10 01:26:45,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=295580.0, ans=0.125 2024-08-10 01:27:06,660 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.35 vs. limit=10.0 2024-08-10 01:27:18,059 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 01:27:18,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=295780.0, ans=0.125 2024-08-10 01:27:20,966 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 600, loss[loss=0.1212, beats_loss=0.01296, ecapa_loss=0.0002814, whisper_loss=0.1055, over 22455.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.0124, ecapa_loss=0.000287, whisper_loss=0.09996, over 3677442.60 frames. ], batch size: 90, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:27:24,058 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 01:27:30,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=295880.0, ans=0.125 2024-08-10 01:27:32,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295880.0, ans=0.1 2024-08-10 01:27:33,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=295880.0, ans=0.2 2024-08-10 01:27:38,357 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.875e+01 3.342e+01 3.961e+01 6.306e+01, threshold=6.685e+01, percent-clipped=0.0 2024-08-10 01:27:50,823 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 01:27:58,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=296080.0, ans=0.125 2024-08-10 01:28:01,315 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.353e-02 2024-08-10 01:28:04,921 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.08 vs. limit=10.0 2024-08-10 01:28:17,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=296180.0, ans=0.125 2024-08-10 01:28:34,367 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-10 01:28:36,081 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 650, loss[loss=0.109, beats_loss=0.01386, ecapa_loss=0.0002483, whisper_loss=0.09268, over 19213.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01235, ecapa_loss=0.0002875, whisper_loss=0.1005, over 3748718.61 frames. ], batch size: 78, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:28:36,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296380.0, ans=0.1 2024-08-10 01:28:43,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=296380.0, ans=0.2 2024-08-10 01:28:51,712 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 39 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-10 01:28:52,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=296480.0, ans=0.02 2024-08-10 01:28:52,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=296480.0, ans=0.07 2024-08-10 01:28:58,865 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 01:29:14,150 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 01:29:24,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=296680.0, ans=0.0 2024-08-10 01:29:33,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=296780.0, ans=0.0 2024-08-10 01:29:48,954 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 700, loss[loss=0.1217, beats_loss=0.01154, ecapa_loss=0.0002578, whisper_loss=0.1076, over 19010.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01234, ecapa_loss=0.0002892, whisper_loss=0.1005, over 3777351.08 frames. ], batch size: 73, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:30:01,114 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-08-10 01:30:07,541 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 2.824e+01 3.267e+01 4.012e+01 5.256e+01, threshold=6.535e+01, percent-clipped=0.0 2024-08-10 01:30:11,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=296980.0, ans=0.07 2024-08-10 01:30:19,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297080.0, ans=0.1 2024-08-10 01:30:47,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=297180.0, ans=0.0 2024-08-10 01:31:03,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=297280.0, ans=0.0 2024-08-10 01:31:04,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=297380.0, ans=0.125 2024-08-10 01:31:05,265 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 750, loss[loss=0.1007, beats_loss=0.01408, ecapa_loss=0.0003298, whisper_loss=0.08334, over 21442.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01235, ecapa_loss=0.0002888, whisper_loss=0.1, over 3789369.39 frames. ], batch size: 93, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:31:16,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297380.0, ans=0.1 2024-08-10 01:31:29,510 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 01:31:37,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=297580.0, ans=0.2 2024-08-10 01:31:45,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=297580.0, ans=0.0 2024-08-10 01:31:57,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=297680.0, ans=0.015 2024-08-10 01:32:04,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=297780.0, ans=0.125 2024-08-10 01:32:18,784 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 800, loss[loss=0.09978, beats_loss=0.01087, ecapa_loss=0.0003046, whisper_loss=0.08587, over 16491.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01235, ecapa_loss=0.0002875, whisper_loss=0.09963, over 3787114.64 frames. ], batch size: 62, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:32:23,401 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 01:32:30,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=297880.0, ans=0.025 2024-08-10 01:32:35,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.843e+01 3.241e+01 3.911e+01 6.650e+01, threshold=6.482e+01, percent-clipped=1.0 2024-08-10 01:32:38,023 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2024-08-10 01:32:45,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297980.0, ans=0.1 2024-08-10 01:33:03,552 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2024-08-10 01:33:33,047 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 850, loss[loss=0.1193, beats_loss=0.01294, ecapa_loss=0.0002383, whisper_loss=0.104, over 15929.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01237, ecapa_loss=0.0002856, whisper_loss=0.09898, over 3784331.00 frames. ], batch size: 59, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:33:34,160 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.82 vs. limit=22.5 2024-08-10 01:33:37,034 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2024-08-10 01:33:46,198 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 37 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-10 01:34:01,096 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2024-08-10 01:34:12,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=298580.0, ans=0.125 2024-08-10 01:34:20,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=298680.0, ans=0.0 2024-08-10 01:34:32,256 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 22 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-10 01:34:35,968 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.06 vs. limit=22.5 2024-08-10 01:34:41,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=298780.0, ans=0.125 2024-08-10 01:34:48,354 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 900, loss[loss=0.1361, beats_loss=0.01071, ecapa_loss=0.0003225, whisper_loss=0.1222, over 16552.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01234, ecapa_loss=0.0002857, whisper_loss=0.09878, over 3795961.72 frames. ], batch size: 65, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:34:56,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=298880.0, ans=0.125 2024-08-10 01:35:04,900 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 01:35:06,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.811e+01 3.274e+01 3.784e+01 5.899e+01, threshold=6.548e+01, percent-clipped=0.0 2024-08-10 01:35:13,685 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 01:35:26,004 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 01:35:56,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=299280.0, ans=0.0 2024-08-10 01:36:03,260 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 950, loss[loss=0.1145, beats_loss=0.01382, ecapa_loss=0.000206, whisper_loss=0.09863, over 22424.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01234, ecapa_loss=0.0002851, whisper_loss=0.0987, over 3812056.53 frames. ], batch size: 86, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:36:03,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=299380.0, ans=0.07 2024-08-10 01:36:05,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=299380.0, ans=0.125 2024-08-10 01:36:11,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=299380.0, ans=0.2 2024-08-10 01:36:11,430 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.30 vs. limit=15.0 2024-08-10 01:36:13,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=299380.0, ans=0.125 2024-08-10 01:36:13,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=299380.0, ans=0.2 2024-08-10 01:36:14,094 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2024-08-10 01:36:15,662 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.40 vs. limit=6.0 2024-08-10 01:36:24,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=299480.0, ans=10.0 2024-08-10 01:36:25,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=299480.0, ans=0.0 2024-08-10 01:37:14,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=299780.0, ans=0.09899494936611666 2024-08-10 01:37:18,783 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1000, loss[loss=0.1447, beats_loss=0.01064, ecapa_loss=0.0002667, whisper_loss=0.1313, over 14920.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01238, ecapa_loss=0.0002829, whisper_loss=0.09892, over 3809330.57 frames. ], batch size: 55, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:37:24,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=299880.0, ans=0.125 2024-08-10 01:37:37,610 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.271e+01 2.926e+01 3.322e+01 3.689e+01 5.712e+01, threshold=6.643e+01, percent-clipped=0.0 2024-08-10 01:37:45,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=299980.0, ans=0.2 2024-08-10 01:37:51,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=300080.0, ans=0.0 2024-08-10 01:38:16,536 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 01:38:18,255 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.49 vs. limit=22.5 2024-08-10 01:38:24,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=300280.0, ans=0.125 2024-08-10 01:38:34,451 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1050, loss[loss=0.1253, beats_loss=0.008794, ecapa_loss=0.0003949, whisper_loss=0.1125, over 16563.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01236, ecapa_loss=0.0002814, whisper_loss=0.09895, over 3799744.48 frames. ], batch size: 67, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:38:49,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=300480.0, ans=0.125 2024-08-10 01:39:15,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=300580.0, ans=0.125 2024-08-10 01:39:18,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=300680.0, ans=0.2 2024-08-10 01:39:30,454 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 01:39:50,625 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1100, loss[loss=0.1175, beats_loss=0.01284, ecapa_loss=0.0002459, whisper_loss=0.1022, over 19225.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01245, ecapa_loss=0.0002799, whisper_loss=0.09882, over 3823404.27 frames. ], batch size: 74, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:40:02,546 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 01:40:02,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=300880.0, ans=0.0 2024-08-10 01:40:07,329 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 01:40:08,590 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.873e+01 3.261e+01 3.724e+01 5.464e+01, threshold=6.522e+01, percent-clipped=0.0 2024-08-10 01:40:11,577 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 01:40:13,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2024-08-10 01:40:32,749 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2024-08-10 01:41:04,441 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1150, loss[loss=0.1419, beats_loss=0.0101, ecapa_loss=0.0002738, whisper_loss=0.1291, over 19897.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01234, ecapa_loss=0.0002819, whisper_loss=0.09932, over 3824055.93 frames. ], batch size: 73, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:41:09,083 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 01:41:22,020 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 01:41:34,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=301580.0, ans=0.125 2024-08-10 01:42:02,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301780.0, ans=0.1 2024-08-10 01:42:19,172 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1200, loss[loss=0.1115, beats_loss=0.01039, ecapa_loss=0.0003191, whisper_loss=0.09787, over 15606.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01239, ecapa_loss=0.0002816, whisper_loss=0.09915, over 3825918.69 frames. ], batch size: 62, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:42:36,901 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.802e+01 3.225e+01 3.750e+01 6.302e+01, threshold=6.450e+01, percent-clipped=0.0 2024-08-10 01:42:49,651 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 01:42:59,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.99 vs. limit=5.0 2024-08-10 01:43:02,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=302180.0, ans=0.125 2024-08-10 01:43:12,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=302180.0, ans=0.125 2024-08-10 01:43:16,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=302180.0, ans=0.0 2024-08-10 01:43:24,113 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=22.5 2024-08-10 01:43:33,142 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1250, loss[loss=0.1126, beats_loss=0.01123, ecapa_loss=0.0002435, whisper_loss=0.09891, over 17082.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01232, ecapa_loss=0.0002817, whisper_loss=0.09942, over 3823052.23 frames. ], batch size: 65, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:43:39,540 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 01:43:45,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=302380.0, ans=0.125 2024-08-10 01:43:53,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=302480.0, ans=0.0 2024-08-10 01:44:16,022 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-10 01:44:25,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302680.0, ans=0.1 2024-08-10 01:44:30,186 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 01:44:35,122 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 01:44:43,824 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 30 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 01:44:48,702 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1300, loss[loss=0.1064, beats_loss=0.01152, ecapa_loss=0.0002822, whisper_loss=0.0921, over 15197.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01225, ecapa_loss=0.0002828, whisper_loss=0.09952, over 3821078.16 frames. ], batch size: 59, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:44:48,907 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-10 01:45:08,213 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.884e+01 3.264e+01 3.595e+01 5.329e+01, threshold=6.528e+01, percent-clipped=0.0 2024-08-10 01:45:09,558 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 01:45:25,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=303080.0, ans=0.125 2024-08-10 01:45:34,128 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 01:45:53,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=303280.0, ans=0.125 2024-08-10 01:46:00,748 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 01:46:09,341 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2024-08-10 01:46:10,189 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1350, loss[loss=0.125, beats_loss=0.01262, ecapa_loss=0.0002676, whisper_loss=0.1097, over 19609.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01229, ecapa_loss=0.0002813, whisper_loss=0.09979, over 3832863.23 frames. ], batch size: 78, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:46:18,551 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.36 vs. limit=10.0 2024-08-10 01:46:25,982 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-10 01:46:30,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=303480.0, ans=0.125 2024-08-10 01:46:35,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0 2024-08-10 01:46:36,723 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 33 from Vox, 25 fro AS 2024-08-10 01:46:38,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=303480.0, ans=0.2 2024-08-10 01:46:42,823 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 01:47:06,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=303680.0, ans=0.125 2024-08-10 01:47:26,949 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1400, loss[loss=0.1255, beats_loss=0.01228, ecapa_loss=0.0003266, whisper_loss=0.1099, over 15035.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01212, ecapa_loss=0.0002828, whisper_loss=0.1004, over 3853593.93 frames. ], batch size: 60, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:47:28,634 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 19 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 01:47:44,382 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.178e+01 2.877e+01 3.100e+01 3.641e+01 7.400e+01, threshold=6.199e+01, percent-clipped=1.0 2024-08-10 01:47:44,592 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 01:47:48,242 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.963e+00 2024-08-10 01:47:49,368 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 01:47:51,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=303980.0, ans=0.125 2024-08-10 01:48:33,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=304280.0, ans=0.2 2024-08-10 01:48:38,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304280.0, ans=0.1 2024-08-10 01:49:10,815 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1450, loss[loss=0.09227, beats_loss=0.01538, ecapa_loss=0.0001826, whisper_loss=0.07507, over 13658.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01217, ecapa_loss=0.0002826, whisper_loss=0.09966, over 3867414.51 frames. ], batch size: 53, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:49:12,613 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 01:49:28,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=304480.0, ans=0.0 2024-08-10 01:49:36,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=304480.0, ans=0.0 2024-08-10 01:49:38,358 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.02 vs. limit=10.0 2024-08-10 01:49:40,389 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 01:49:48,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=304580.0, ans=10.0 2024-08-10 01:50:10,232 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=22.5 2024-08-10 01:50:17,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-10 01:50:21,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=304780.0, ans=0.125 2024-08-10 01:50:30,171 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1500, loss[loss=0.1256, beats_loss=0.01285, ecapa_loss=0.0002638, whisper_loss=0.1101, over 18862.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01217, ecapa_loss=0.000282, whisper_loss=0.09943, over 3838500.34 frames. ], batch size: 73, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:50:42,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=304880.0, ans=0.125 2024-08-10 01:50:49,816 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.825e+01 3.192e+01 3.755e+01 6.662e+01, threshold=6.384e+01, percent-clipped=1.0 2024-08-10 01:50:50,009 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 01:50:51,586 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 01:50:59,546 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 01:51:17,005 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 01:51:24,540 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 01:51:30,500 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 01:51:44,727 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-10 01:51:48,644 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1550, loss[loss=0.1137, beats_loss=0.01419, ecapa_loss=0.0002256, whisper_loss=0.09722, over 23909.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01221, ecapa_loss=0.0002818, whisper_loss=0.09917, over 3832587.29 frames. ], batch size: 91, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:51:55,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=305380.0, ans=0.125 2024-08-10 01:52:00,760 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.581e+00 2024-08-10 01:52:11,515 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 01:52:18,120 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 01:52:37,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=305680.0, ans=0.1 2024-08-10 01:52:39,867 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=22.5 2024-08-10 01:52:44,573 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 01:53:07,950 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1600, loss[loss=0.1076, beats_loss=0.01281, ecapa_loss=0.0002557, whisper_loss=0.09219, over 14608.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01225, ecapa_loss=0.0002805, whisper_loss=0.09923, over 3850890.54 frames. ], batch size: 57, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:53:19,162 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.782e+00 2024-08-10 01:53:23,602 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.192e+05 2024-08-10 01:53:25,594 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2024-08-10 01:53:27,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.961e+01 3.443e+01 4.067e+01 6.226e+01, threshold=6.887e+01, percent-clipped=0.0 2024-08-10 01:53:39,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306080.0, ans=0.1 2024-08-10 01:53:44,845 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 01:53:52,924 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.858e-03 2024-08-10 01:54:10,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=306280.0, ans=0.125 2024-08-10 01:54:26,982 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1650, loss[loss=0.09248, beats_loss=0.01383, ecapa_loss=0.0002824, whisper_loss=0.07582, over 17712.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01218, ecapa_loss=0.0002819, whisper_loss=0.0998, over 3838853.53 frames. ], batch size: 72, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:54:31,082 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2024-08-10 01:54:46,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=306480.0, ans=0.125 2024-08-10 01:54:47,721 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-10 01:55:12,705 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 01:55:23,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=306680.0, ans=0.125 2024-08-10 01:55:43,837 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1700, loss[loss=0.09289, beats_loss=0.01225, ecapa_loss=0.0002885, whisper_loss=0.07775, over 14579.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01217, ecapa_loss=0.0002815, whisper_loss=0.1002, over 3839795.83 frames. ], batch size: 59, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:55:47,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=306880.0, ans=0.125 2024-08-10 01:56:01,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.503e+01 3.006e+01 3.281e+01 3.850e+01 2.955e+02, threshold=6.563e+01, percent-clipped=2.0 2024-08-10 01:56:03,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=306980.0, ans=0.04949747468305833 2024-08-10 01:56:13,497 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.70 vs. limit=6.0 2024-08-10 01:56:20,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=307080.0, ans=0.125 2024-08-10 01:56:27,104 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 01:56:28,539 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-10 01:56:35,561 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-10 01:56:48,726 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 01:56:52,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=307280.0, ans=0.2 2024-08-10 01:56:57,894 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1750, loss[loss=0.1207, beats_loss=0.01076, ecapa_loss=0.0003208, whisper_loss=0.1067, over 18746.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.0122, ecapa_loss=0.0002796, whisper_loss=0.09985, over 3863453.74 frames. ], batch size: 76, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:57:06,591 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 01:57:07,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=307380.0, ans=0.0 2024-08-10 01:57:27,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=307580.0, ans=0.2 2024-08-10 01:57:34,006 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=15.0 2024-08-10 01:57:36,256 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 17 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 01:57:41,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2024-08-10 01:57:50,316 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 24 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 01:57:51,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=307680.0, ans=0.125 2024-08-10 01:57:53,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=307680.0, ans=0.0 2024-08-10 01:57:59,871 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 31 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 01:58:01,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=307780.0, ans=0.0 2024-08-10 01:58:09,341 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1800, loss[loss=0.1192, beats_loss=0.01049, ecapa_loss=0.0002915, whisper_loss=0.1058, over 22937.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01222, ecapa_loss=0.0002775, whisper_loss=0.09977, over 3842987.85 frames. ], batch size: 89, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:58:25,222 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 01:58:26,315 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.751e+01 3.157e+01 3.582e+01 5.631e+01, threshold=6.314e+01, percent-clipped=0.0 2024-08-10 01:58:29,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=307980.0, ans=0.2 2024-08-10 01:58:34,352 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-10 01:58:49,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=308080.0, ans=0.0 2024-08-10 01:58:58,710 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 01:59:06,240 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.35 vs. limit=15.0 2024-08-10 01:59:10,885 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 01:59:12,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=308280.0, ans=0.125 2024-08-10 01:59:15,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=308280.0, ans=0.0 2024-08-10 01:59:20,414 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1850, loss[loss=0.09144, beats_loss=0.01195, ecapa_loss=0.0002748, whisper_loss=0.07674, over 17726.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.0123, ecapa_loss=0.0002793, whisper_loss=0.09831, over 3834385.50 frames. ], batch size: 68, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:59:43,768 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 28 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 01:59:55,858 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 9 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 01:59:57,153 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 02:00:03,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=308680.0, ans=0.0 2024-08-10 02:00:15,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308780.0, ans=0.1 2024-08-10 02:00:30,613 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1900, loss[loss=0.109, beats_loss=0.01009, ecapa_loss=0.0004349, whisper_loss=0.09458, over 21274.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01239, ecapa_loss=0.0002853, whisper_loss=0.09763, over 3850405.56 frames. ], batch size: 90, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 02:00:37,903 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.240e+00 2024-08-10 02:00:47,773 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.899e+01 3.416e+01 4.271e+01 7.702e+01, threshold=6.832e+01, percent-clipped=2.0 2024-08-10 02:01:03,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=309080.0, ans=0.035 2024-08-10 02:01:13,039 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.53 vs. limit=22.5 2024-08-10 02:01:15,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=309180.0, ans=0.0 2024-08-10 02:01:19,915 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2024-08-10 02:01:23,132 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 02:01:39,572 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 1950, loss[loss=0.1033, beats_loss=0.01379, ecapa_loss=0.000205, whisper_loss=0.08747, over 14950.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01234, ecapa_loss=0.0002912, whisper_loss=0.09786, over 3811891.56 frames. ], batch size: 55, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 02:01:55,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=309480.0, ans=0.1 2024-08-10 02:01:58,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=309480.0, ans=0.125 2024-08-10 02:02:15,008 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 02:02:26,090 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 02:02:34,358 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-10 02:02:34,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=309680.0, ans=0.0 2024-08-10 02:02:44,593 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 02:02:47,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=309780.0, ans=0.0 2024-08-10 02:02:48,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=12.0 2024-08-10 02:02:49,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2024-08-10 02:02:51,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2000, loss[loss=0.1099, beats_loss=0.01103, ecapa_loss=0.0002866, whisper_loss=0.09603, over 18151.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01243, ecapa_loss=0.0002921, whisper_loss=0.09816, over 3832269.57 frames. ], batch size: 70, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:03:01,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=309880.0, ans=0.0 2024-08-10 02:03:09,441 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.384e+01 2.983e+01 3.552e+01 3.984e+01 6.262e+01, threshold=7.103e+01, percent-clipped=0.0 2024-08-10 02:03:14,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=309980.0, ans=0.125 2024-08-10 02:03:23,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=310080.0, ans=0.0 2024-08-10 02:03:26,969 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-10 02:03:35,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=310180.0, ans=0.125 2024-08-10 02:03:40,095 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 02:03:48,135 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 02:03:57,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=310280.0, ans=0.125 2024-08-10 02:04:03,935 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2050, loss[loss=0.1256, beats_loss=0.0118, ecapa_loss=0.0002844, whisper_loss=0.111, over 22483.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01242, ecapa_loss=0.0002944, whisper_loss=0.098, over 3838130.26 frames. ], batch size: 86, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:04:12,122 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-10 02:04:18,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=310480.0, ans=0.015 2024-08-10 02:04:32,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=310580.0, ans=0.2 2024-08-10 02:04:42,798 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 02:04:49,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=310680.0, ans=0.125 2024-08-10 02:04:55,779 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2024-08-10 02:05:02,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=310780.0, ans=0.2 2024-08-10 02:05:12,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=310880.0, ans=0.2 2024-08-10 02:05:13,003 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2100, loss[loss=0.1284, beats_loss=0.01313, ecapa_loss=0.0003062, whisper_loss=0.1122, over 22937.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.0125, ecapa_loss=0.0002962, whisper_loss=0.09783, over 3818535.01 frames. ], batch size: 90, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:05:29,478 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.901e+01 3.264e+01 3.705e+01 5.595e+01, threshold=6.528e+01, percent-clipped=0.0 2024-08-10 02:05:35,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=310980.0, ans=0.0 2024-08-10 02:05:39,654 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 15 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 02:06:05,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=311180.0, ans=0.125 2024-08-10 02:06:17,921 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 02:06:19,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=311280.0, ans=0.0 2024-08-10 02:06:23,174 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2150, loss[loss=0.115, beats_loss=0.01185, ecapa_loss=0.0003321, whisper_loss=0.09983, over 15495.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01259, ecapa_loss=0.0002973, whisper_loss=0.09781, over 3800022.85 frames. ], batch size: 63, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:06:35,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=311380.0, ans=0.0 2024-08-10 02:06:44,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=311480.0, ans=0.1 2024-08-10 02:06:48,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=311480.0, ans=0.125 2024-08-10 02:06:50,581 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2024-08-10 02:07:01,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=311580.0, ans=0.0 2024-08-10 02:07:30,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=311780.0, ans=0.0 2024-08-10 02:07:35,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=311780.0, ans=0.125 2024-08-10 02:07:38,358 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2200, loss[loss=0.1222, beats_loss=0.0101, ecapa_loss=0.000327, whisper_loss=0.1088, over 20519.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01247, ecapa_loss=0.0002994, whisper_loss=0.09875, over 3821635.11 frames. ], batch size: 79, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:07:44,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=311880.0, ans=0.1 2024-08-10 02:07:48,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=311880.0, ans=0.0 2024-08-10 02:07:48,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=311880.0, ans=0.125 2024-08-10 02:07:53,894 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 02:07:55,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.913e+01 3.407e+01 3.904e+01 7.612e+01, threshold=6.814e+01, percent-clipped=1.0 2024-08-10 02:07:55,790 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 02:08:08,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=312080.0, ans=0.0 2024-08-10 02:08:11,364 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 02:08:26,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=312180.0, ans=0.0 2024-08-10 02:08:40,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=312280.0, ans=0.1 2024-08-10 02:08:50,753 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2250, loss[loss=0.1023, beats_loss=0.01353, ecapa_loss=0.0003018, whisper_loss=0.08576, over 15788.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01251, ecapa_loss=0.0003006, whisper_loss=0.09863, over 3802536.89 frames. ], batch size: 66, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:09:02,293 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2024-08-10 02:09:12,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312480.0, ans=0.1 2024-08-10 02:09:12,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=312480.0, ans=0.125 2024-08-10 02:09:18,006 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 16 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-10 02:09:19,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=312580.0, ans=0.025 2024-08-10 02:09:26,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=312580.0, ans=0.0 2024-08-10 02:09:27,905 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-10 02:09:32,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=312580.0, ans=22.5 2024-08-10 02:09:52,411 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 02:10:03,903 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2300, loss[loss=0.141, beats_loss=0.01116, ecapa_loss=0.0002764, whisper_loss=0.1271, over 19051.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01247, ecapa_loss=0.0002993, whisper_loss=0.09923, over 3848556.31 frames. ], batch size: 70, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:10:05,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=312880.0, ans=0.0 2024-08-10 02:10:11,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=312880.0, ans=0.125 2024-08-10 02:10:18,821 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 02:10:21,417 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 3.060e+01 3.416e+01 3.893e+01 7.548e+01, threshold=6.833e+01, percent-clipped=2.0 2024-08-10 02:10:30,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=312980.0, ans=0.2 2024-08-10 02:10:55,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=313180.0, ans=0.125 2024-08-10 02:10:56,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313180.0, ans=0.1 2024-08-10 02:11:11,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=313280.0, ans=0.125 2024-08-10 02:11:14,772 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2350, loss[loss=0.1118, beats_loss=0.01497, ecapa_loss=0.0003023, whisper_loss=0.09377, over 18627.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01247, ecapa_loss=0.0002968, whisper_loss=0.09891, over 3858992.23 frames. ], batch size: 77, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:11:15,025 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 02:11:17,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=313380.0, ans=0.125 2024-08-10 02:11:49,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=313580.0, ans=0.0 2024-08-10 02:11:59,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313680.0, ans=0.1 2024-08-10 02:12:15,909 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 02:12:25,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=313780.0, ans=0.07 2024-08-10 02:12:27,306 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 02:12:28,477 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2400, loss[loss=0.1133, beats_loss=0.0134, ecapa_loss=0.0003098, whisper_loss=0.09676, over 21163.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01242, ecapa_loss=0.0002965, whisper_loss=0.09953, over 3855981.73 frames. ], batch size: 88, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:12:34,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=313880.0, ans=0.125 2024-08-10 02:12:35,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=313880.0, ans=0.125 2024-08-10 02:12:44,962 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+01 3.008e+01 3.355e+01 4.317e+01 6.888e+01, threshold=6.709e+01, percent-clipped=1.0 2024-08-10 02:12:59,982 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2024-08-10 02:13:35,338 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2024-08-10 02:13:40,406 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2450, loss[loss=0.1006, beats_loss=0.01402, ecapa_loss=0.000237, whisper_loss=0.08422, over 22224.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01248, ecapa_loss=0.0002959, whisper_loss=0.09861, over 3852372.95 frames. ], batch size: 87, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:13:47,263 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.27 vs. limit=15.0 2024-08-10 02:13:58,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=314480.0, ans=0.125 2024-08-10 02:14:16,753 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 02:14:28,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=314680.0, ans=0.2 2024-08-10 02:14:37,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=314780.0, ans=0.125 2024-08-10 02:14:40,238 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 26 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-10 02:14:54,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2500, loss[loss=0.1116, beats_loss=0.01037, ecapa_loss=0.0003427, whisper_loss=0.09782, over 18324.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01241, ecapa_loss=0.0002972, whisper_loss=0.09958, over 3849267.38 frames. ], batch size: 70, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:15:12,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 3.053e+01 3.458e+01 4.005e+01 5.985e+01, threshold=6.915e+01, percent-clipped=0.0 2024-08-10 02:15:17,061 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 02:15:20,469 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.89 vs. limit=6.0 2024-08-10 02:15:38,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=315180.0, ans=0.125 2024-08-10 02:15:44,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=315180.0, ans=0.125 2024-08-10 02:15:47,374 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 02:15:50,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=315180.0, ans=0.125 2024-08-10 02:15:56,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=315280.0, ans=0.125 2024-08-10 02:16:07,853 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2550, loss[loss=0.1059, beats_loss=0.01249, ecapa_loss=0.0002719, whisper_loss=0.09069, over 22589.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01247, ecapa_loss=0.0002958, whisper_loss=0.09934, over 3863551.10 frames. ], batch size: 93, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:16:09,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=315380.0, ans=0.125 2024-08-10 02:16:21,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=315480.0, ans=0.1 2024-08-10 02:16:21,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=315480.0, ans=0.125 2024-08-10 02:16:24,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=315480.0, ans=0.125 2024-08-10 02:16:43,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=315580.0, ans=0.0 2024-08-10 02:17:00,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=315680.0, ans=0.125 2024-08-10 02:17:11,488 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 02:17:13,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=315780.0, ans=0.125 2024-08-10 02:17:19,099 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-10 02:17:20,776 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2600, loss[loss=0.1186, beats_loss=0.01081, ecapa_loss=0.0003608, whisper_loss=0.1042, over 19418.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01256, ecapa_loss=0.0002932, whisper_loss=0.09856, over 3851755.09 frames. ], batch size: 78, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:17:31,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=315880.0, ans=0.125 2024-08-10 02:17:38,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 2.748e+01 3.170e+01 3.706e+01 6.461e+01, threshold=6.341e+01, percent-clipped=0.0 2024-08-10 02:17:40,301 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2024-08-10 02:17:41,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=315980.0, ans=0.125 2024-08-10 02:17:46,031 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 02:17:52,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=316080.0, ans=0.125 2024-08-10 02:17:53,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=316080.0, ans=0.125 2024-08-10 02:18:01,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=316080.0, ans=0.0 2024-08-10 02:18:08,812 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 14 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 02:18:12,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=316180.0, ans=0.035 2024-08-10 02:18:20,888 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2024-08-10 02:18:21,555 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 02:18:30,733 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 02:18:38,119 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2650, loss[loss=0.1311, beats_loss=0.01205, ecapa_loss=0.0003233, whisper_loss=0.1159, over 23207.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01253, ecapa_loss=0.0002942, whisper_loss=0.099, over 3851220.20 frames. ], batch size: 94, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:18:57,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=316480.0, ans=0.125 2024-08-10 02:18:58,693 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-10 02:19:13,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=316580.0, ans=0.0 2024-08-10 02:19:13,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2024-08-10 02:19:24,260 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=15.0 2024-08-10 02:19:28,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=316680.0, ans=0.125 2024-08-10 02:19:40,184 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 02:19:46,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=316780.0, ans=0.125 2024-08-10 02:19:54,365 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2700, loss[loss=0.1035, beats_loss=0.01432, ecapa_loss=0.0002498, whisper_loss=0.08667, over 19121.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01255, ecapa_loss=0.0002951, whisper_loss=0.09888, over 3860008.37 frames. ], batch size: 75, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:20:09,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=316980.0, ans=0.125 2024-08-10 02:20:11,905 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.947e+01 3.317e+01 3.968e+01 5.790e+01, threshold=6.635e+01, percent-clipped=0.0 2024-08-10 02:20:12,127 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 02:20:18,680 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 02:20:18,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=316980.0, ans=0.125 2024-08-10 02:20:19,041 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.730e-01 2024-08-10 02:20:20,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=316980.0, ans=0.125 2024-08-10 02:20:23,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=317080.0, ans=0.2 2024-08-10 02:20:24,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=317080.0, ans=0.125 2024-08-10 02:20:30,525 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 02:20:46,887 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 02:21:07,871 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2750, loss[loss=0.09467, beats_loss=0.01448, ecapa_loss=0.0003152, whisper_loss=0.07704, over 20618.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01254, ecapa_loss=0.0002974, whisper_loss=0.09908, over 3865472.36 frames. ], batch size: 86, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:21:31,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=317480.0, ans=0.0 2024-08-10 02:21:38,595 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 02:21:40,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.33 vs. limit=22.5 2024-08-10 02:21:50,955 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 02:22:02,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=317680.0, ans=0.125 2024-08-10 02:22:05,670 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 02:22:18,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=317780.0, ans=0.1 2024-08-10 02:22:24,269 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2800, loss[loss=0.1308, beats_loss=0.01337, ecapa_loss=0.0003039, whisper_loss=0.1144, over 19408.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01248, ecapa_loss=0.0002962, whisper_loss=0.1005, over 3913749.62 frames. ], batch size: 78, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:22:32,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=317880.0, ans=0.125 2024-08-10 02:22:35,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=317880.0, ans=0.0 2024-08-10 02:22:43,054 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 3.037e+01 3.440e+01 4.229e+01 1.125e+02, threshold=6.879e+01, percent-clipped=1.0 2024-08-10 02:22:52,268 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 12 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 02:23:05,866 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.31 vs. limit=22.5 2024-08-10 02:23:08,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=318180.0, ans=0.0 2024-08-10 02:23:14,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=318180.0, ans=0.05 2024-08-10 02:23:31,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=318280.0, ans=0.125 2024-08-10 02:23:31,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=318280.0, ans=0.0 2024-08-10 02:23:31,698 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.00 vs. limit=15.0 2024-08-10 02:23:34,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=318280.0, ans=0.125 2024-08-10 02:23:35,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=318280.0, ans=0.125 2024-08-10 02:23:39,744 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2850, loss[loss=0.1263, beats_loss=0.012, ecapa_loss=0.000316, whisper_loss=0.1112, over 19149.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01251, ecapa_loss=0.0002962, whisper_loss=0.1011, over 3927742.83 frames. ], batch size: 76, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:23:49,588 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 02:24:08,113 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 02:24:30,132 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 02:24:50,202 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.92 vs. limit=6.0 2024-08-10 02:24:57,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=318780.0, ans=0.0 2024-08-10 02:25:01,635 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2900, loss[loss=0.1026, beats_loss=0.01559, ecapa_loss=0.0003381, whisper_loss=0.08365, over 21036.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01254, ecapa_loss=0.0002991, whisper_loss=0.1001, over 3913725.30 frames. ], batch size: 91, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:25:02,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=318880.0, ans=0.0 2024-08-10 02:25:06,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=318880.0, ans=0.0 2024-08-10 02:25:09,479 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 02:25:18,704 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-10 02:25:18,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=318980.0, ans=0.125 2024-08-10 02:25:19,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.79 vs. limit=6.0 2024-08-10 02:25:19,867 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.293e+01 2.971e+01 3.564e+01 4.159e+01 7.122e+01, threshold=7.127e+01, percent-clipped=1.0 2024-08-10 02:25:23,510 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 02:25:31,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319080.0, ans=0.1 2024-08-10 02:25:41,140 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-10 02:25:59,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=319180.0, ans=0.125 2024-08-10 02:26:16,388 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 02:26:17,610 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 2950, loss[loss=0.1129, beats_loss=0.0111, ecapa_loss=0.0003064, whisper_loss=0.09873, over 22218.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01247, ecapa_loss=0.0002987, whisper_loss=0.09988, over 3919023.75 frames. ], batch size: 87, lr: 2.00e-02, grad_scale: 1048576.0 2024-08-10 02:26:44,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=319580.0, ans=0.2 2024-08-10 02:26:49,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=319580.0, ans=0.125 2024-08-10 02:27:07,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=319680.0, ans=0.125 2024-08-10 02:27:11,508 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.46 vs. limit=22.5 2024-08-10 02:27:20,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=319780.0, ans=0.2 2024-08-10 02:27:24,010 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3000, loss[loss=0.1342, beats_loss=0.008844, ecapa_loss=0.0003649, whisper_loss=0.1217, over 21295.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01247, ecapa_loss=0.0003013, whisper_loss=0.1003, over 3925513.89 frames. ], batch size: 86, lr: 2.00e-02, grad_scale: 1048576.0 2024-08-10 02:27:24,010 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 02:28:04,509 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on ASR_libri: loss=0.2772, beats_loss=0, ecapa_loss=0.0008938, whisper_loss=0.2682, over 922467.00 frames. 2024-08-10 02:28:22,853 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on SV_voxceleb1: loss=0.007832, beats_loss=0, ecapa_loss=0.0007832, whisper_loss=0, over 939242.00 frames. 2024-08-10 02:30:19,768 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on AT_audioset: loss=0.02861, beats_loss=0.02861, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 02:30:19,771 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 02:30:22,976 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 02:30:24,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=319880.0, ans=0.125 2024-08-10 02:30:38,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 2.871e+01 3.251e+01 3.853e+01 5.451e+01, threshold=6.502e+01, percent-clipped=0.0 2024-08-10 02:30:44,449 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 02:30:52,575 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 02:30:58,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=320080.0, ans=0.2 2024-08-10 02:31:10,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=320180.0, ans=0.125 2024-08-10 02:31:14,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=320180.0, ans=0.0 2024-08-10 02:31:18,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=320280.0, ans=0.125 2024-08-10 02:31:26,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=320280.0, ans=0.125 2024-08-10 02:31:30,926 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3050, loss[loss=0.1202, beats_loss=0.01065, ecapa_loss=0.0002896, whisper_loss=0.1067, over 19674.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01248, ecapa_loss=0.0003027, whisper_loss=0.1003, over 3936666.60 frames. ], batch size: 78, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:31:41,821 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 02:31:48,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=320480.0, ans=10.0 2024-08-10 02:31:49,866 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=12.0 2024-08-10 02:32:16,595 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 33 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-10 02:32:19,799 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-10 02:32:36,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=320780.0, ans=0.125 2024-08-10 02:32:39,993 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3100, loss[loss=0.09832, beats_loss=0.01376, ecapa_loss=0.0002155, whisper_loss=0.0824, over 17871.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01248, ecapa_loss=0.0003008, whisper_loss=0.1002, over 3929714.31 frames. ], batch size: 68, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:32:45,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=320880.0, ans=0.125 2024-08-10 02:32:48,433 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 02:32:54,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=320980.0, ans=0.0 2024-08-10 02:32:55,933 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.32 vs. limit=15.0 2024-08-10 02:32:56,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 2.934e+01 3.353e+01 3.892e+01 7.432e+01, threshold=6.707e+01, percent-clipped=2.0 2024-08-10 02:32:57,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=320980.0, ans=0.0 2024-08-10 02:33:34,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=321280.0, ans=0.125 2024-08-10 02:33:37,650 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 02:33:48,418 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3150, loss[loss=0.1181, beats_loss=0.01234, ecapa_loss=0.0003088, whisper_loss=0.1026, over 21957.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01236, ecapa_loss=0.0003009, whisper_loss=0.1004, over 3886508.20 frames. ], batch size: 91, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:33:50,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=321380.0, ans=0.125 2024-08-10 02:34:07,522 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2024-08-10 02:34:13,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=321480.0, ans=0.0 2024-08-10 02:34:16,358 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 02:34:21,747 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 02:34:35,457 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 38 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-10 02:34:35,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=321680.0, ans=0.125 2024-08-10 02:34:45,479 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 35 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 02:34:53,165 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 02:34:57,455 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3200, loss[loss=0.1085, beats_loss=0.01279, ecapa_loss=0.0002949, whisper_loss=0.09279, over 22787.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01225, ecapa_loss=0.000301, whisper_loss=0.1011, over 3843813.66 frames. ], batch size: 93, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:35:04,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=321880.0, ans=0.0 2024-08-10 02:35:13,218 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.260e+01 2.789e+01 3.261e+01 3.853e+01 5.155e+01, threshold=6.521e+01, percent-clipped=0.0 2024-08-10 02:35:17,813 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-10 02:35:26,382 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 02:35:29,056 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 02:35:31,145 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.46 vs. limit=15.0 2024-08-10 02:35:32,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=322080.0, ans=0.1 2024-08-10 02:35:32,962 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 02:35:33,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=322080.0, ans=0.125 2024-08-10 02:35:54,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=322280.0, ans=0.0 2024-08-10 02:36:06,613 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3250, loss[loss=0.1328, beats_loss=0.01138, ecapa_loss=0.0002994, whisper_loss=0.1184, over 16540.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01227, ecapa_loss=0.000302, whisper_loss=0.1013, over 3834811.71 frames. ], batch size: 62, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:36:16,348 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 02:36:27,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=322480.0, ans=0.125 2024-08-10 02:36:35,977 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-10 02:36:56,622 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.079e+00 2024-08-10 02:36:59,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=322680.0, ans=0.04949747468305833 2024-08-10 02:37:15,018 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3300, loss[loss=0.1265, beats_loss=0.01381, ecapa_loss=0.00026, whisper_loss=0.1101, over 22009.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01231, ecapa_loss=0.0003013, whisper_loss=0.1012, over 3859624.54 frames. ], batch size: 86, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:37:19,825 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=12.0 2024-08-10 02:37:23,526 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 02:37:31,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.021e+01 3.431e+01 4.015e+01 7.071e+01, threshold=6.862e+01, percent-clipped=2.0 2024-08-10 02:37:43,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=323080.0, ans=0.1 2024-08-10 02:37:50,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=323080.0, ans=0.125 2024-08-10 02:37:57,610 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.75 vs. limit=22.5 2024-08-10 02:38:18,751 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-10 02:38:23,784 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3350, loss[loss=0.1093, beats_loss=0.01062, ecapa_loss=0.0003144, whisper_loss=0.09554, over 13964.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01229, ecapa_loss=0.0002996, whisper_loss=0.1008, over 3853164.09 frames. ], batch size: 55, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:38:24,583 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2024-08-10 02:38:36,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2024-08-10 02:38:47,021 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-10 02:39:02,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=323680.0, ans=0.125 2024-08-10 02:39:25,890 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 02:39:30,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=323880.0, ans=0.0 2024-08-10 02:39:31,253 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3400, loss[loss=0.105, beats_loss=0.0138, ecapa_loss=0.0002845, whisper_loss=0.08834, over 23299.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01239, ecapa_loss=0.0002974, whisper_loss=0.09934, over 3846910.31 frames. ], batch size: 94, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:39:47,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+01 2.812e+01 3.293e+01 3.899e+01 6.283e+01, threshold=6.585e+01, percent-clipped=0.0 2024-08-10 02:39:51,112 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.19 vs. limit=10.0 2024-08-10 02:39:54,497 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 02:39:58,843 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 02:40:07,396 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.85 vs. limit=22.5 2024-08-10 02:40:15,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=324180.0, ans=0.125 2024-08-10 02:40:39,487 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3450, loss[loss=0.1032, beats_loss=0.01406, ecapa_loss=0.0002483, whisper_loss=0.0867, over 19532.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.0125, ecapa_loss=0.0002983, whisper_loss=0.0987, over 3879550.34 frames. ], batch size: 79, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:40:50,997 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 02:40:54,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=324480.0, ans=0.1 2024-08-10 02:40:55,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=324480.0, ans=0.09899494936611666 2024-08-10 02:40:58,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=324480.0, ans=0.125 2024-08-10 02:40:58,963 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2024-08-10 02:40:59,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=324480.0, ans=0.125 2024-08-10 02:41:05,007 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 02:41:06,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=324580.0, ans=0.0 2024-08-10 02:41:18,811 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 24 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-10 02:41:25,693 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 02:41:31,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=324680.0, ans=0.1 2024-08-10 02:41:33,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.30 vs. limit=22.5 2024-08-10 02:41:34,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=324780.0, ans=0.125 2024-08-10 02:41:36,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=324780.0, ans=0.0 2024-08-10 02:41:47,643 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 13 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 02:41:48,668 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3500, loss[loss=0.08224, beats_loss=0.0151, ecapa_loss=0.0003062, whisper_loss=0.06408, over 13581.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01241, ecapa_loss=0.0003009, whisper_loss=0.09949, over 3863606.57 frames. ], batch size: 58, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:41:49,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=324880.0, ans=0.2 2024-08-10 02:42:05,291 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 3.058e+01 3.643e+01 4.338e+01 7.554e+01, threshold=7.285e+01, percent-clipped=1.0 2024-08-10 02:42:12,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=324980.0, ans=0.5 2024-08-10 02:42:22,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=325080.0, ans=0.125 2024-08-10 02:42:23,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=325080.0, ans=0.2 2024-08-10 02:42:41,358 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 02:42:43,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=325280.0, ans=0.125 2024-08-10 02:42:44,888 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2024-08-10 02:42:50,987 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 36 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 02:42:54,167 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.39 vs. limit=12.0 2024-08-10 02:42:57,742 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3550, loss[loss=0.1243, beats_loss=0.01397, ecapa_loss=0.0002361, whisper_loss=0.1079, over 22043.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01234, ecapa_loss=0.0002997, whisper_loss=0.09989, over 3894914.31 frames. ], batch size: 85, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:43:02,052 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 02:43:02,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=325380.0, ans=0.0 2024-08-10 02:43:08,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=325380.0, ans=0.125 2024-08-10 02:43:11,823 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 02:43:21,106 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2024-08-10 02:43:22,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=325480.0, ans=0.125 2024-08-10 02:43:31,261 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 12 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 02:43:35,595 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 02:43:41,696 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.82 vs. limit=22.5 2024-08-10 02:43:46,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=325680.0, ans=0.0 2024-08-10 02:43:46,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=325680.0, ans=0.0 2024-08-10 02:43:48,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=325680.0, ans=0.1 2024-08-10 02:43:55,195 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 27 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 02:44:07,117 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3600, loss[loss=0.08886, beats_loss=0.01559, ecapa_loss=0.0002636, whisper_loss=0.07064, over 21686.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01223, ecapa_loss=0.0002997, whisper_loss=0.1002, over 3855707.22 frames. ], batch size: 89, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:44:23,880 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.942e+01 3.351e+01 3.815e+01 6.062e+01, threshold=6.702e+01, percent-clipped=0.0 2024-08-10 02:44:40,973 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 02:44:42,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=326080.0, ans=0.125 2024-08-10 02:44:47,298 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2024-08-10 02:44:49,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=326180.0, ans=0.0 2024-08-10 02:45:04,466 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 02:45:06,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=326280.0, ans=0.125 2024-08-10 02:45:10,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=326280.0, ans=0.125 2024-08-10 02:45:11,360 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 02:45:12,841 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 02:45:17,144 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3650, loss[loss=0.1298, beats_loss=0.01294, ecapa_loss=0.0002618, whisper_loss=0.1143, over 22276.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01225, ecapa_loss=0.0002997, whisper_loss=0.09997, over 3807084.72 frames. ], batch size: 86, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:45:24,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=326380.0, ans=0.0 2024-08-10 02:45:39,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=326480.0, ans=0.95 2024-08-10 02:45:57,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=326680.0, ans=0.0 2024-08-10 02:46:03,679 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 02:46:10,471 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 02:46:14,069 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=15.0 2024-08-10 02:46:25,743 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3700, loss[loss=0.1197, beats_loss=0.01097, ecapa_loss=0.0003389, whisper_loss=0.1054, over 18184.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01225, ecapa_loss=0.0003, whisper_loss=0.09993, over 3826347.29 frames. ], batch size: 73, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:46:26,338 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.677e-01 2024-08-10 02:46:34,223 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 02:46:34,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=326880.0, ans=0.0 2024-08-10 02:46:42,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.947e+01 3.360e+01 4.039e+01 7.794e+01, threshold=6.721e+01, percent-clipped=1.0 2024-08-10 02:46:48,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=326980.0, ans=0.2 2024-08-10 02:46:52,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=327080.0, ans=0.125 2024-08-10 02:47:00,465 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 22 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-10 02:47:08,443 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-10 02:47:11,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=327180.0, ans=0.0 2024-08-10 02:47:17,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=327180.0, ans=0.0 2024-08-10 02:47:19,488 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 02:47:30,135 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 13 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 02:47:33,271 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-10 02:47:33,958 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3750, loss[loss=0.1003, beats_loss=0.01234, ecapa_loss=0.0003008, whisper_loss=0.08496, over 21439.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01226, ecapa_loss=0.0002995, whisper_loss=0.1003, over 3827016.58 frames. ], batch size: 88, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:47:35,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327380.0, ans=0.1 2024-08-10 02:47:47,382 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.53 vs. limit=15.0 2024-08-10 02:48:12,537 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 02:48:19,676 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 02:48:19,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=327680.0, ans=0.1 2024-08-10 02:48:22,413 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 02:48:29,309 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 02:48:32,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=327780.0, ans=0.04949747468305833 2024-08-10 02:48:33,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327780.0, ans=0.1 2024-08-10 02:48:42,496 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3800, loss[loss=0.09934, beats_loss=0.01545, ecapa_loss=0.0002778, whisper_loss=0.08111, over 22848.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01229, ecapa_loss=0.0002998, whisper_loss=0.101, over 3876567.41 frames. ], batch size: 94, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:48:44,925 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2024-08-10 02:48:48,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=327880.0, ans=0.2 2024-08-10 02:48:51,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=327880.0, ans=0.125 2024-08-10 02:48:58,847 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.560e+01 3.072e+01 3.520e+01 3.991e+01 6.360e+01, threshold=7.040e+01, percent-clipped=0.0 2024-08-10 02:49:10,284 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 02:49:13,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=328080.0, ans=0.0 2024-08-10 02:49:20,445 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.766e+00 2024-08-10 02:49:41,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=328280.0, ans=0.125 2024-08-10 02:49:51,924 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3850, loss[loss=0.1264, beats_loss=0.0123, ecapa_loss=0.0003058, whisper_loss=0.1111, over 22496.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01238, ecapa_loss=0.0002982, whisper_loss=0.1008, over 3908778.59 frames. ], batch size: 92, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:49:53,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=328380.0, ans=0.125 2024-08-10 02:49:57,497 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 02:50:12,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=328480.0, ans=0.125 2024-08-10 02:50:15,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=328480.0, ans=0.125 2024-08-10 02:50:31,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=328680.0, ans=0.0 2024-08-10 02:50:54,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=328780.0, ans=0.125 2024-08-10 02:50:57,553 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.79 vs. limit=10.0 2024-08-10 02:50:59,261 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3900, loss[loss=0.1004, beats_loss=0.0147, ecapa_loss=0.0003206, whisper_loss=0.08249, over 13578.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01242, ecapa_loss=0.000299, whisper_loss=0.1005, over 3894801.49 frames. ], batch size: 55, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:51:00,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=12.0 2024-08-10 02:51:15,673 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 3.142e+01 3.558e+01 4.007e+01 5.949e+01, threshold=7.115e+01, percent-clipped=0.0 2024-08-10 02:51:25,914 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=12.0 2024-08-10 02:51:49,833 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 02:52:06,303 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-10 02:52:06,838 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 3950, loss[loss=0.1234, beats_loss=0.01057, ecapa_loss=0.0003362, whisper_loss=0.1094, over 19207.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01233, ecapa_loss=0.0003009, whisper_loss=0.1005, over 3880716.03 frames. ], batch size: 77, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:52:18,003 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.27 vs. limit=15.0 2024-08-10 02:52:32,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=329580.0, ans=0.1 2024-08-10 02:52:33,883 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 32 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 02:52:38,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=329580.0, ans=0.125 2024-08-10 02:52:44,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=329580.0, ans=0.0 2024-08-10 02:52:50,061 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 02:52:53,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=329680.0, ans=0.0 2024-08-10 02:53:07,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329780.0, ans=0.1 2024-08-10 02:53:11,421 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-10 02:53:12,757 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 31 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-10 02:53:13,831 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4000, loss[loss=0.1421, beats_loss=0.006967, ecapa_loss=0.0003428, whisper_loss=0.1317, over 17904.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01235, ecapa_loss=0.0002998, whisper_loss=0.1003, over 3880164.23 frames. ], batch size: 69, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:53:22,149 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 02:53:22,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=329880.0, ans=0.125 2024-08-10 02:53:30,330 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.231e+01 3.066e+01 3.430e+01 3.923e+01 5.367e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-10 02:53:33,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=329980.0, ans=0.0 2024-08-10 02:53:41,014 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 02:53:48,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=330080.0, ans=0.05 2024-08-10 02:53:55,104 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 02:54:08,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=330280.0, ans=0.125 2024-08-10 02:54:15,512 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 15 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 02:54:21,817 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4050, loss[loss=0.1462, beats_loss=0.01119, ecapa_loss=0.0002771, whisper_loss=0.1322, over 24127.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01235, ecapa_loss=0.0002977, whisper_loss=0.1009, over 3908494.53 frames. ], batch size: 91, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:54:32,458 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2024-08-10 02:54:42,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=330480.0, ans=0.2 2024-08-10 02:54:44,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=330480.0, ans=0.125 2024-08-10 02:54:58,448 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 02:55:01,852 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2024-08-10 02:55:02,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=330680.0, ans=0.125 2024-08-10 02:55:02,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=330680.0, ans=0.125 2024-08-10 02:55:05,597 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.77 vs. limit=22.5 2024-08-10 02:55:10,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=330680.0, ans=0.0 2024-08-10 02:55:12,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=330680.0, ans=0.125 2024-08-10 02:55:15,810 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 02:55:17,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-08-10 02:55:25,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=330780.0, ans=0.0 2024-08-10 02:55:26,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=330780.0, ans=0.0 2024-08-10 02:55:28,886 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4100, loss[loss=0.1299, beats_loss=0.009861, ecapa_loss=0.0003569, whisper_loss=0.1165, over 14826.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01234, ecapa_loss=0.0002988, whisper_loss=0.1007, over 3896487.88 frames. ], batch size: 58, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:55:44,894 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.505e+01 2.941e+01 3.171e+01 3.928e+01 6.026e+01, threshold=6.343e+01, percent-clipped=0.0 2024-08-10 02:55:57,572 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 02:56:04,462 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-10 02:56:17,533 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 02:56:19,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=331180.0, ans=0.0 2024-08-10 02:56:26,557 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 02:56:35,910 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4150, loss[loss=0.1313, beats_loss=0.01084, ecapa_loss=0.0003242, whisper_loss=0.1172, over 16394.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01228, ecapa_loss=0.0002983, whisper_loss=0.101, over 3891228.37 frames. ], batch size: 66, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:56:49,619 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 28 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 02:56:55,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=331480.0, ans=0.0 2024-08-10 02:57:06,798 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 02:57:11,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=331580.0, ans=0.125 2024-08-10 02:57:13,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=331580.0, ans=0.0 2024-08-10 02:57:21,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331680.0, ans=0.1 2024-08-10 02:57:32,168 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 02:57:33,290 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 02:57:34,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=331780.0, ans=0.125 2024-08-10 02:57:35,910 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 02:57:40,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=331780.0, ans=0.125 2024-08-10 02:57:42,328 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4200, loss[loss=0.1386, beats_loss=0.01073, ecapa_loss=0.0002943, whisper_loss=0.125, over 19247.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01241, ecapa_loss=0.0002977, whisper_loss=0.1004, over 3905950.13 frames. ], batch size: 75, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:57:55,054 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 02:57:57,577 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 02:57:58,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=331980.0, ans=0.1 2024-08-10 02:57:58,800 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.978e+01 3.511e+01 4.145e+01 7.481e+01, threshold=7.022e+01, percent-clipped=3.0 2024-08-10 02:58:23,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332180.0, ans=0.1 2024-08-10 02:58:26,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=332180.0, ans=0.125 2024-08-10 02:58:31,069 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 02:58:38,925 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 02:58:45,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=332280.0, ans=0.125 2024-08-10 02:58:50,537 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4250, loss[loss=0.125, beats_loss=0.0104, ecapa_loss=0.0002797, whisper_loss=0.1118, over 23091.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01235, ecapa_loss=0.0002982, whisper_loss=0.1004, over 3915043.59 frames. ], batch size: 92, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:58:56,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=332380.0, ans=0.1 2024-08-10 02:59:17,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=332580.0, ans=0.0 2024-08-10 02:59:22,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.98 vs. limit=15.0 2024-08-10 02:59:32,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=332680.0, ans=0.0 2024-08-10 02:59:41,793 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 02:59:51,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=332780.0, ans=0.2 2024-08-10 02:59:51,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332780.0, ans=0.1 2024-08-10 02:59:59,468 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4300, loss[loss=0.1367, beats_loss=0.01078, ecapa_loss=0.000301, whisper_loss=0.1229, over 23025.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01233, ecapa_loss=0.000298, whisper_loss=0.09973, over 3885780.11 frames. ], batch size: 90, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 03:00:15,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.965e+01 3.329e+01 3.837e+01 6.258e+01, threshold=6.658e+01, percent-clipped=0.0 2024-08-10 03:00:22,554 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 03:00:26,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333080.0, ans=0.1 2024-08-10 03:00:41,894 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.180e+00 2024-08-10 03:00:51,041 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2024-08-10 03:01:06,696 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4350, loss[loss=0.1249, beats_loss=0.009703, ecapa_loss=0.000329, whisper_loss=0.1119, over 20374.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01234, ecapa_loss=0.0002983, whisper_loss=0.1002, over 3901790.73 frames. ], batch size: 79, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:01:06,937 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 03:01:29,756 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 03:01:40,580 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-10 03:01:54,264 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-10 03:02:07,776 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-10 03:02:13,932 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4400, loss[loss=0.1229, beats_loss=0.01291, ecapa_loss=0.0002384, whisper_loss=0.1076, over 18783.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01231, ecapa_loss=0.0002951, whisper_loss=0.1011, over 3917411.21 frames. ], batch size: 71, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:02:22,370 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 03:02:22,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2024-08-10 03:02:26,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=333980.0, ans=0.0 2024-08-10 03:02:30,429 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 2.919e+01 3.248e+01 3.746e+01 6.587e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-10 03:02:32,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2024-08-10 03:02:33,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=333980.0, ans=0.125 2024-08-10 03:02:46,111 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=12.0 2024-08-10 03:02:56,055 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 03:02:56,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=334180.0, ans=0.125 2024-08-10 03:02:56,579 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.60 vs. limit=12.0 2024-08-10 03:03:07,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=334280.0, ans=0.0 2024-08-10 03:03:18,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=334280.0, ans=0.2 2024-08-10 03:03:22,140 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4450, loss[loss=0.1018, beats_loss=0.01465, ecapa_loss=0.0001975, whisper_loss=0.08516, over 16421.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01227, ecapa_loss=0.000296, whisper_loss=0.1003, over 3882966.11 frames. ], batch size: 63, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:03:23,513 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 03:03:25,348 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.01 vs. limit=10.0 2024-08-10 03:03:28,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=334380.0, ans=0.2 2024-08-10 03:03:39,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334480.0, ans=0.1 2024-08-10 03:03:41,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334480.0, ans=0.1 2024-08-10 03:03:52,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=334580.0, ans=0.0 2024-08-10 03:03:56,127 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-10 03:04:35,089 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4500, loss[loss=0.07309, beats_loss=0.0126, ecapa_loss=0.0003003, whisper_loss=0.05749, over 13085.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01233, ecapa_loss=0.0002961, whisper_loss=0.09939, over 3871482.14 frames. ], batch size: 53, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:04:52,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.347e+01 3.021e+01 3.497e+01 4.022e+01 7.846e+01, threshold=6.995e+01, percent-clipped=4.0 2024-08-10 03:05:07,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=335080.0, ans=0.0 2024-08-10 03:05:08,469 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-10 03:05:10,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.65 vs. limit=10.0 2024-08-10 03:05:21,074 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 03:05:29,321 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-10 03:05:41,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=335280.0, ans=0.0 2024-08-10 03:05:46,519 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4550, loss[loss=0.1174, beats_loss=0.01195, ecapa_loss=0.0003096, whisper_loss=0.1023, over 22451.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01234, ecapa_loss=0.0002962, whisper_loss=0.09966, over 3905642.49 frames. ], batch size: 91, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:05:47,362 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.51 vs. limit=12.0 2024-08-10 03:05:47,874 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 03:05:59,122 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 03:06:26,130 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.98 vs. limit=10.0 2024-08-10 03:06:34,767 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 03:06:36,605 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-10 03:06:46,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=335780.0, ans=0.2 2024-08-10 03:06:58,494 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4600, loss[loss=0.09339, beats_loss=0.01337, ecapa_loss=0.0002599, whisper_loss=0.07742, over 19877.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01237, ecapa_loss=0.0002963, whisper_loss=0.1, over 3912022.24 frames. ], batch size: 79, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:07:03,115 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 03:07:15,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 3.246e+01 3.685e+01 4.349e+01 7.107e+01, threshold=7.370e+01, percent-clipped=1.0 2024-08-10 03:07:35,130 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 03:07:37,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=336080.0, ans=15.0 2024-08-10 03:07:40,466 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 03:07:43,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=336180.0, ans=0.0 2024-08-10 03:07:44,708 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 03:07:50,858 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 03:07:55,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=336280.0, ans=0.125 2024-08-10 03:08:00,975 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2024-08-10 03:08:05,278 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 03:08:10,313 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4650, loss[loss=0.1152, beats_loss=0.0114, ecapa_loss=0.0003349, whisper_loss=0.1004, over 20832.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01236, ecapa_loss=0.0002956, whisper_loss=0.1006, over 3918832.20 frames. ], batch size: 87, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:08:10,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=336380.0, ans=0.015 2024-08-10 03:08:17,610 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 03:08:35,490 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.568e-01 2024-08-10 03:08:38,205 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 03:08:42,077 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 03:09:09,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=336780.0, ans=0.125 2024-08-10 03:09:23,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4700, loss[loss=0.1138, beats_loss=0.01244, ecapa_loss=0.0002601, whisper_loss=0.09873, over 23284.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01232, ecapa_loss=0.0002937, whisper_loss=0.101, over 3907835.98 frames. ], batch size: 91, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:09:39,426 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 03:09:40,393 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 3.034e+01 3.372e+01 4.313e+01 2.367e+02, threshold=6.744e+01, percent-clipped=2.0 2024-08-10 03:09:40,665 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 03:10:24,398 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 03:10:26,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=337280.0, ans=0.125 2024-08-10 03:10:27,182 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 03:10:33,129 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 03:10:34,019 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4750, loss[loss=0.111, beats_loss=0.01094, ecapa_loss=0.000365, whisper_loss=0.09641, over 16142.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01236, ecapa_loss=0.0002928, whisper_loss=0.1, over 3894084.70 frames. ], batch size: 67, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:10:34,748 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.21 vs. limit=22.5 2024-08-10 03:10:48,276 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2024-08-10 03:10:55,090 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 03:10:57,773 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 03:11:01,963 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 18 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 03:11:11,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=337580.0, ans=0.0 2024-08-10 03:11:20,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=337680.0, ans=0.125 2024-08-10 03:11:36,276 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2024-08-10 03:11:37,136 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 03:11:39,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=337780.0, ans=15.0 2024-08-10 03:11:40,113 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-10 03:11:41,878 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 03:11:47,150 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4800, loss[loss=0.1229, beats_loss=0.01388, ecapa_loss=0.0002451, whisper_loss=0.1066, over 18799.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01244, ecapa_loss=0.0002944, whisper_loss=0.09948, over 3887857.30 frames. ], batch size: 72, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:11:51,082 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=15.0 2024-08-10 03:12:03,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.006e+01 3.324e+01 3.733e+01 5.524e+01, threshold=6.647e+01, percent-clipped=0.0 2024-08-10 03:12:04,189 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 16 from LS+wenet, 32 from Vox, 24 fro AS 2024-08-10 03:12:04,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=337980.0, ans=0.125 2024-08-10 03:12:06,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.34 vs. limit=6.0 2024-08-10 03:12:10,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=337980.0, ans=0.2 2024-08-10 03:12:23,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=338080.0, ans=0.125 2024-08-10 03:12:26,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=338080.0, ans=0.0 2024-08-10 03:12:30,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=338180.0, ans=0.125 2024-08-10 03:12:44,458 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 03:12:49,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=338280.0, ans=0.125 2024-08-10 03:12:53,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=338280.0, ans=0.2 2024-08-10 03:12:57,206 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-10 03:12:58,626 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4850, loss[loss=0.1112, beats_loss=0.01477, ecapa_loss=0.0002213, whisper_loss=0.09423, over 17695.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01247, ecapa_loss=0.0002951, whisper_loss=0.09894, over 3900919.56 frames. ], batch size: 70, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:13:08,617 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.240e+00 2024-08-10 03:13:15,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=338480.0, ans=0.035 2024-08-10 03:13:22,930 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-10 03:13:39,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=338580.0, ans=0.125 2024-08-10 03:13:56,189 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 03:14:09,899 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 16 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 03:14:11,023 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4900, loss[loss=0.09203, beats_loss=0.01568, ecapa_loss=0.0002521, whisper_loss=0.07383, over 17678.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.0125, ecapa_loss=0.0002957, whisper_loss=0.09849, over 3904561.87 frames. ], batch size: 70, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:14:16,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2024-08-10 03:14:17,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=338880.0, ans=0.09899494936611666 2024-08-10 03:14:28,522 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+01 3.105e+01 3.477e+01 3.938e+01 7.192e+01, threshold=6.955e+01, percent-clipped=1.0 2024-08-10 03:14:46,564 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-08-10 03:15:00,918 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2024-08-10 03:15:22,643 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 4950, loss[loss=0.1135, beats_loss=0.0155, ecapa_loss=0.0002589, whisper_loss=0.09538, over 23771.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01244, ecapa_loss=0.0002965, whisper_loss=0.09864, over 3895013.50 frames. ], batch size: 94, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:15:34,693 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 03:15:45,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.22 vs. limit=22.5 2024-08-10 03:15:57,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=339580.0, ans=0.1 2024-08-10 03:15:58,898 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-10 03:16:01,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=339580.0, ans=0.0 2024-08-10 03:16:07,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=339680.0, ans=0.0 2024-08-10 03:16:08,211 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 03:16:09,114 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.03 vs. limit=10.0 2024-08-10 03:16:20,476 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 03:16:21,366 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 03:16:22,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=339780.0, ans=0.125 2024-08-10 03:16:23,413 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.33 vs. limit=15.0 2024-08-10 03:16:24,066 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 03:16:30,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=339780.0, ans=0.125 2024-08-10 03:16:36,274 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5000, loss[loss=0.1177, beats_loss=0.01267, ecapa_loss=0.000314, whisper_loss=0.1019, over 21109.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01237, ecapa_loss=0.0002966, whisper_loss=0.09965, over 3915407.73 frames. ], batch size: 89, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:16:45,972 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.27 vs. limit=12.0 2024-08-10 03:16:53,176 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.921e+01 3.372e+01 3.826e+01 7.563e+01, threshold=6.744e+01, percent-clipped=1.0 2024-08-10 03:17:17,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=31.82 vs. limit=22.5 2024-08-10 03:17:21,768 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 03:17:28,935 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 20 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-10 03:17:33,293 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 03:17:37,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=340280.0, ans=0.125 2024-08-10 03:17:40,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340280.0, ans=0.1 2024-08-10 03:17:48,501 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5050, loss[loss=0.1218, beats_loss=0.01382, ecapa_loss=0.0002608, whisper_loss=0.1054, over 15579.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01242, ecapa_loss=0.0002954, whisper_loss=0.1, over 3905293.24 frames. ], batch size: 61, lr: 1.95e-02, grad_scale: 4194304.0 2024-08-10 03:17:51,429 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 03:18:06,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=340480.0, ans=0.0 2024-08-10 03:18:08,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=340480.0, ans=0.0 2024-08-10 03:18:12,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=340480.0, ans=0.125 2024-08-10 03:18:31,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=340680.0, ans=0.0 2024-08-10 03:18:33,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=340680.0, ans=0.0 2024-08-10 03:18:33,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=340680.0, ans=0.0 2024-08-10 03:18:40,928 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-08-10 03:18:45,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=340780.0, ans=0.125 2024-08-10 03:19:01,807 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5100, loss[loss=0.1068, beats_loss=0.012, ecapa_loss=0.0002884, whisper_loss=0.09196, over 21441.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01243, ecapa_loss=0.0002942, whisper_loss=0.1005, over 3917765.32 frames. ], batch size: 84, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:19:06,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=340880.0, ans=0.0 2024-08-10 03:19:16,497 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-08-10 03:19:19,759 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.388e+01 2.974e+01 3.405e+01 3.841e+01 8.729e+01, threshold=6.810e+01, percent-clipped=2.0 2024-08-10 03:19:32,554 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 03:19:38,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=341080.0, ans=0.0 2024-08-10 03:19:40,500 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 03:19:56,389 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-10 03:20:17,342 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5150, loss[loss=0.1317, beats_loss=0.009809, ecapa_loss=0.0002834, whisper_loss=0.1191, over 15098.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01227, ecapa_loss=0.0002943, whisper_loss=0.1011, over 3908711.85 frames. ], batch size: 56, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:20:27,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=341380.0, ans=0.0 2024-08-10 03:20:28,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=341380.0, ans=0.1 2024-08-10 03:20:47,071 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=22.5 2024-08-10 03:20:52,845 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 03:21:00,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=341580.0, ans=0.125 2024-08-10 03:21:16,733 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 03:21:17,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=341780.0, ans=0.125 2024-08-10 03:21:32,996 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5200, loss[loss=0.112, beats_loss=0.011, ecapa_loss=0.0003364, whisper_loss=0.09765, over 18623.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01231, ecapa_loss=0.000292, whisper_loss=0.1006, over 3891405.08 frames. ], batch size: 77, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:21:44,328 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2024-08-10 03:21:49,785 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 03:21:51,125 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.930e+01 3.270e+01 3.996e+01 6.105e+01, threshold=6.539e+01, percent-clipped=0.0 2024-08-10 03:21:57,303 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 03:21:59,740 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2024-08-10 03:22:11,689 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 03:22:13,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=342080.0, ans=0.125 2024-08-10 03:22:16,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=342180.0, ans=0.025 2024-08-10 03:22:32,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=342280.0, ans=0.0 2024-08-10 03:22:32,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=342280.0, ans=0.0 2024-08-10 03:22:46,700 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5250, loss[loss=0.1279, beats_loss=0.0123, ecapa_loss=0.0003051, whisper_loss=0.1126, over 17178.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01242, ecapa_loss=0.0002906, whisper_loss=0.09967, over 3882193.56 frames. ], batch size: 66, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:22:59,263 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 03:23:21,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=342580.0, ans=0.125 2024-08-10 03:23:24,452 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 03:23:26,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=342580.0, ans=0.125 2024-08-10 03:23:40,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=342680.0, ans=0.125 2024-08-10 03:23:43,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=342680.0, ans=0.125 2024-08-10 03:23:46,629 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 03:24:00,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.34 vs. limit=22.5 2024-08-10 03:24:02,514 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5300, loss[loss=0.1295, beats_loss=0.009767, ecapa_loss=0.000309, whisper_loss=0.1167, over 22132.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01231, ecapa_loss=0.0002907, whisper_loss=0.1008, over 3903337.96 frames. ], batch size: 88, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:24:13,086 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-10 03:24:20,286 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.874e+01 3.315e+01 3.923e+01 7.752e+01, threshold=6.630e+01, percent-clipped=2.0 2024-08-10 03:24:33,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=343080.0, ans=0.1 2024-08-10 03:24:50,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=343180.0, ans=0.0 2024-08-10 03:24:58,295 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-10 03:24:58,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.01 vs. limit=10.0 2024-08-10 03:25:15,701 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5350, loss[loss=0.1378, beats_loss=0.01132, ecapa_loss=0.0002247, whisper_loss=0.1242, over 23614.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.0123, ecapa_loss=0.00029, whisper_loss=0.1012, over 3910478.37 frames. ], batch size: 89, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:25:20,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343380.0, ans=0.1 2024-08-10 03:25:26,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=343380.0, ans=0.0 2024-08-10 03:25:36,735 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-10 03:25:38,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=343480.0, ans=0.1 2024-08-10 03:26:02,767 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-10 03:26:17,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=343780.0, ans=0.125 2024-08-10 03:26:29,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=343780.0, ans=0.125 2024-08-10 03:26:32,419 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5400, loss[loss=0.1169, beats_loss=0.01415, ecapa_loss=0.0002189, whisper_loss=0.1005, over 19735.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01223, ecapa_loss=0.000289, whisper_loss=0.1012, over 3906599.04 frames. ], batch size: 76, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:26:48,541 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2024-08-10 03:26:50,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.952e+01 3.404e+01 3.987e+01 5.856e+01, threshold=6.808e+01, percent-clipped=0.0 2024-08-10 03:26:55,975 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 03:26:57,866 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.91 vs. limit=22.5 2024-08-10 03:27:00,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2024-08-10 03:27:03,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=344080.0, ans=0.125 2024-08-10 03:27:03,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=344080.0, ans=0.0 2024-08-10 03:27:11,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=344080.0, ans=0.0 2024-08-10 03:27:20,235 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 03:27:29,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=344180.0, ans=0.125 2024-08-10 03:27:29,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=344180.0, ans=0.125 2024-08-10 03:27:36,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=344280.0, ans=0.125 2024-08-10 03:27:46,310 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.29 vs. limit=15.0 2024-08-10 03:27:46,557 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5450, loss[loss=0.1017, beats_loss=0.01492, ecapa_loss=0.0002948, whisper_loss=0.08381, over 21090.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01222, ecapa_loss=0.0002884, whisper_loss=0.1019, over 3882729.25 frames. ], batch size: 90, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:28:08,787 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 03:28:27,234 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=12.0 2024-08-10 03:28:44,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=344680.0, ans=0.0 2024-08-10 03:28:57,642 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 03:29:01,061 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-10 03:29:01,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=344780.0, ans=0.125 2024-08-10 03:29:03,768 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5500, loss[loss=0.0945, beats_loss=0.01215, ecapa_loss=0.0002934, whisper_loss=0.07941, over 21946.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01229, ecapa_loss=0.0002879, whisper_loss=0.1014, over 3874179.52 frames. ], batch size: 88, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:29:07,643 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 31 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 03:29:19,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=344980.0, ans=0.1 2024-08-10 03:29:22,160 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.947e+01 3.297e+01 3.879e+01 5.625e+01, threshold=6.594e+01, percent-clipped=0.0 2024-08-10 03:29:49,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=345180.0, ans=0.0 2024-08-10 03:29:57,197 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 03:30:00,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=345180.0, ans=0.0 2024-08-10 03:30:13,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=345280.0, ans=0.2 2024-08-10 03:30:19,054 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5550, loss[loss=0.1276, beats_loss=0.01081, ecapa_loss=0.0002826, whisper_loss=0.1139, over 23069.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01232, ecapa_loss=0.0002875, whisper_loss=0.1015, over 3898637.57 frames. ], batch size: 92, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:30:21,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=345380.0, ans=0.0 2024-08-10 03:30:31,171 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=12.0 2024-08-10 03:30:58,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=345580.0, ans=0.1 2024-08-10 03:31:12,459 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=15.0 2024-08-10 03:31:30,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=345780.0, ans=0.2 2024-08-10 03:31:35,551 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5600, loss[loss=0.1059, beats_loss=0.0128, ecapa_loss=0.0003051, whisper_loss=0.09007, over 15337.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01237, ecapa_loss=0.0002895, whisper_loss=0.1005, over 3884435.12 frames. ], batch size: 65, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:31:40,349 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-10 03:31:42,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=345880.0, ans=0.07 2024-08-10 03:31:51,527 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2024-08-10 03:31:53,498 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.950e+01 3.327e+01 3.865e+01 5.194e+01, threshold=6.655e+01, percent-clipped=0.0 2024-08-10 03:32:12,895 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 03:32:30,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=346180.0, ans=0.0 2024-08-10 03:32:37,794 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 03:32:39,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=346280.0, ans=0.1 2024-08-10 03:32:49,688 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5650, loss[loss=0.08851, beats_loss=0.01299, ecapa_loss=0.0003124, whisper_loss=0.0724, over 14925.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01232, ecapa_loss=0.0002928, whisper_loss=0.1001, over 3850932.26 frames. ], batch size: 62, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:32:53,606 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 03:32:53,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=346380.0, ans=0.125 2024-08-10 03:33:08,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=346480.0, ans=0.125 2024-08-10 03:33:15,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=346480.0, ans=0.1 2024-08-10 03:33:43,908 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-10 03:33:47,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=346680.0, ans=0.2 2024-08-10 03:33:49,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=346780.0, ans=0.125 2024-08-10 03:34:02,473 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 03:34:05,482 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5700, loss[loss=0.09735, beats_loss=0.009689, ecapa_loss=0.0003197, whisper_loss=0.08446, over 20284.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01231, ecapa_loss=0.0002967, whisper_loss=0.1, over 3854813.37 frames. ], batch size: 79, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:34:18,882 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 03:34:18,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=346980.0, ans=0.125 2024-08-10 03:34:23,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.410e+01 2.923e+01 3.363e+01 4.122e+01 7.176e+01, threshold=6.726e+01, percent-clipped=2.0 2024-08-10 03:34:23,648 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 03:34:41,031 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2024-08-10 03:34:43,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-10 03:34:44,594 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 03:34:46,320 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 03:34:58,782 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 03:35:19,120 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 03:35:22,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5750, loss[loss=0.1337, beats_loss=0.01138, ecapa_loss=0.000284, whisper_loss=0.1194, over 17587.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01241, ecapa_loss=0.0002948, whisper_loss=0.0994, over 3831991.11 frames. ], batch size: 67, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:35:52,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347580.0, ans=0.1 2024-08-10 03:36:06,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=347680.0, ans=0.1 2024-08-10 03:36:20,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=347780.0, ans=0.0 2024-08-10 03:36:20,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=347780.0, ans=0.2 2024-08-10 03:36:36,081 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5800, loss[loss=0.1378, beats_loss=0.009845, ecapa_loss=0.00034, whisper_loss=0.1246, over 21081.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.0124, ecapa_loss=0.0002971, whisper_loss=0.09895, over 3819016.61 frames. ], batch size: 88, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:36:36,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=347880.0, ans=0.125 2024-08-10 03:36:43,984 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 03:36:54,528 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+01 2.824e+01 3.290e+01 3.735e+01 8.555e+01, threshold=6.581e+01, percent-clipped=2.0 2024-08-10 03:36:59,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=347980.0, ans=0.125 2024-08-10 03:37:08,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=348080.0, ans=0.1 2024-08-10 03:37:10,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=348080.0, ans=0.125 2024-08-10 03:37:26,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=348180.0, ans=0.125 2024-08-10 03:37:26,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=348180.0, ans=0.125 2024-08-10 03:37:27,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=348180.0, ans=0.125 2024-08-10 03:37:34,202 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.14 vs. limit=22.5 2024-08-10 03:37:38,198 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 03:37:42,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=348280.0, ans=0.1 2024-08-10 03:37:45,186 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-10 03:37:50,807 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5850, loss[loss=0.09359, beats_loss=0.01495, ecapa_loss=0.000273, whisper_loss=0.07591, over 19080.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01234, ecapa_loss=0.0002984, whisper_loss=0.0993, over 3825427.21 frames. ], batch size: 77, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:38:23,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=348580.0, ans=0.0 2024-08-10 03:38:25,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=348580.0, ans=0.1 2024-08-10 03:38:37,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=348680.0, ans=0.0 2024-08-10 03:38:40,434 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 17 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 03:38:45,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=348680.0, ans=0.125 2024-08-10 03:38:45,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=348680.0, ans=0.0 2024-08-10 03:38:54,751 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.23 vs. limit=6.0 2024-08-10 03:39:00,760 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5900, loss[loss=0.1052, beats_loss=0.01024, ecapa_loss=0.0003043, whisper_loss=0.09188, over 18133.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01232, ecapa_loss=0.0002968, whisper_loss=0.0994, over 3795218.50 frames. ], batch size: 73, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:39:01,471 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=23.19 vs. limit=15.0 2024-08-10 03:39:07,675 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 03:39:15,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=348980.0, ans=0.125 2024-08-10 03:39:16,456 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.929e+01 3.311e+01 3.794e+01 5.610e+01, threshold=6.621e+01, percent-clipped=0.0 2024-08-10 03:39:18,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=348980.0, ans=0.125 2024-08-10 03:39:21,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=348980.0, ans=0.125 2024-08-10 03:39:29,789 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2024-08-10 03:39:32,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=349080.0, ans=0.125 2024-08-10 03:39:44,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=349180.0, ans=0.1 2024-08-10 03:40:05,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=349280.0, ans=0.2 2024-08-10 03:40:06,969 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-10 03:40:08,293 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 17 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 03:40:09,358 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 5950, loss[loss=0.08841, beats_loss=0.01543, ecapa_loss=0.0002661, whisper_loss=0.07032, over 18813.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01242, ecapa_loss=0.0002951, whisper_loss=0.09842, over 3780684.41 frames. ], batch size: 80, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:40:15,242 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 03:40:24,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=349480.0, ans=0.2 2024-08-10 03:40:36,446 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 22 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 03:40:39,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=349580.0, ans=0.015 2024-08-10 03:40:46,208 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 03:41:09,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349780.0, ans=0.1 2024-08-10 03:41:10,737 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 03:41:12,045 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 03:41:18,607 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6000, loss[loss=0.122, beats_loss=0.01041, ecapa_loss=0.0003537, whisper_loss=0.1081, over 19730.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01245, ecapa_loss=0.0002927, whisper_loss=0.09801, over 3812987.59 frames. ], batch size: 80, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:41:18,607 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 03:41:57,771 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on ASR_libri: loss=0.2761, beats_loss=0, ecapa_loss=0.0008742, whisper_loss=0.2674, over 922467.00 frames. 2024-08-10 03:42:15,656 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on SV_voxceleb1: loss=0.007667, beats_loss=0, ecapa_loss=0.0007667, whisper_loss=0, over 939242.00 frames. 2024-08-10 03:44:14,918 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on AT_audioset: loss=0.0285, beats_loss=0.0285, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 03:44:14,922 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 03:44:23,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=349880.0, ans=0.035 2024-08-10 03:44:26,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=349880.0, ans=0.025 2024-08-10 03:44:31,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=15.0 2024-08-10 03:44:32,125 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 3.043e+01 3.498e+01 4.267e+01 5.483e+01, threshold=6.996e+01, percent-clipped=0.0 2024-08-10 03:44:38,288 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-10 03:44:59,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=350180.0, ans=0.0 2024-08-10 03:45:10,636 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 03:45:26,533 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6050, loss[loss=0.1144, beats_loss=0.01176, ecapa_loss=0.0002935, whisper_loss=0.09968, over 22238.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01243, ecapa_loss=0.0002943, whisper_loss=0.09848, over 3823051.71 frames. ], batch size: 91, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:45:35,295 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 03:45:48,006 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 03:46:06,917 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2024-08-10 03:46:18,989 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 03:46:34,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=350780.0, ans=0.125 2024-08-10 03:46:36,530 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6100, loss[loss=0.1255, beats_loss=0.01218, ecapa_loss=0.0003916, whisper_loss=0.1094, over 21328.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01243, ecapa_loss=0.0002936, whisper_loss=0.09832, over 3846002.47 frames. ], batch size: 89, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:46:42,248 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 03:46:53,152 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.971e+01 3.424e+01 4.102e+01 1.085e+02, threshold=6.848e+01, percent-clipped=1.0 2024-08-10 03:46:57,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=350980.0, ans=0.125 2024-08-10 03:47:45,800 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6150, loss[loss=0.1364, beats_loss=0.01117, ecapa_loss=0.000229, whisper_loss=0.123, over 21449.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01237, ecapa_loss=0.0002938, whisper_loss=0.09951, over 3887521.14 frames. ], batch size: 78, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:47:47,670 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 03:47:51,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=351380.0, ans=0.2 2024-08-10 03:48:03,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=351480.0, ans=0.125 2024-08-10 03:48:05,713 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 19 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 03:48:12,830 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 03:48:16,117 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2024-08-10 03:48:32,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=351680.0, ans=0.0 2024-08-10 03:48:37,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=351680.0, ans=0.125 2024-08-10 03:48:39,230 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.667e+00 2024-08-10 03:48:41,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=351780.0, ans=0.04949747468305833 2024-08-10 03:48:54,929 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6200, loss[loss=0.09661, beats_loss=0.0124, ecapa_loss=0.0003129, whisper_loss=0.08109, over 21255.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01239, ecapa_loss=0.0002912, whisper_loss=0.09915, over 3892798.91 frames. ], batch size: 89, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:48:55,702 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=12.0 2024-08-10 03:49:00,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=351880.0, ans=0.07 2024-08-10 03:49:03,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=351880.0, ans=0.0 2024-08-10 03:49:11,264 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.480e+01 3.031e+01 3.409e+01 3.924e+01 5.999e+01, threshold=6.819e+01, percent-clipped=0.0 2024-08-10 03:49:14,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=351980.0, ans=0.125 2024-08-10 03:49:20,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=351980.0, ans=0.1 2024-08-10 03:49:42,682 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 03:49:48,410 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 20 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-10 03:49:56,545 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 03:50:02,891 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6250, loss[loss=0.09888, beats_loss=0.01404, ecapa_loss=0.0002487, whisper_loss=0.08235, over 14020.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01248, ecapa_loss=0.000291, whisper_loss=0.09808, over 3877514.95 frames. ], batch size: 56, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:50:32,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=352580.0, ans=0.0 2024-08-10 03:50:37,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=352580.0, ans=0.0 2024-08-10 03:50:41,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=352580.0, ans=0.0 2024-08-10 03:50:44,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=352680.0, ans=0.125 2024-08-10 03:50:52,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=352680.0, ans=0.125 2024-08-10 03:50:56,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=352780.0, ans=0.125 2024-08-10 03:50:58,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352780.0, ans=0.1 2024-08-10 03:50:59,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-08-10 03:51:04,388 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 03:51:10,696 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6300, loss[loss=0.1111, beats_loss=0.01195, ecapa_loss=0.0003481, whisper_loss=0.0957, over 21435.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01245, ecapa_loss=0.000293, whisper_loss=0.09868, over 3873517.45 frames. ], batch size: 89, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:51:15,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=352880.0, ans=0.1 2024-08-10 03:51:15,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=352880.0, ans=0.125 2024-08-10 03:51:23,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=352980.0, ans=15.0 2024-08-10 03:51:27,324 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 3.057e+01 3.444e+01 4.179e+01 1.718e+02, threshold=6.888e+01, percent-clipped=1.0 2024-08-10 03:51:38,451 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 03:51:43,296 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2024-08-10 03:51:45,200 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 03:51:52,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=353180.0, ans=0.125 2024-08-10 03:51:57,748 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 03:51:57,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=353180.0, ans=0.5 2024-08-10 03:52:05,976 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-08-10 03:52:06,318 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-10 03:52:19,401 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6350, loss[loss=0.127, beats_loss=0.009161, ecapa_loss=0.000334, whisper_loss=0.1145, over 22793.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01238, ecapa_loss=0.0002957, whisper_loss=0.09928, over 3888300.32 frames. ], batch size: 93, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:52:22,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=353380.0, ans=0.0 2024-08-10 03:52:34,764 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 03:52:54,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=353580.0, ans=0.125 2024-08-10 03:53:01,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=353680.0, ans=0.125 2024-08-10 03:53:02,490 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 28 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 03:53:28,574 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6400, loss[loss=0.1154, beats_loss=0.01427, ecapa_loss=0.0003024, whisper_loss=0.09815, over 22778.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01233, ecapa_loss=0.0002959, whisper_loss=0.09954, over 3865306.96 frames. ], batch size: 94, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:53:35,475 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 03:53:42,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=353980.0, ans=0.125 2024-08-10 03:53:44,888 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.873e+01 3.233e+01 3.602e+01 5.742e+01, threshold=6.465e+01, percent-clipped=0.0 2024-08-10 03:53:59,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=354080.0, ans=10.0 2024-08-10 03:54:14,641 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=12.0 2024-08-10 03:54:15,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=354180.0, ans=0.125 2024-08-10 03:54:15,882 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2024-08-10 03:54:23,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=354280.0, ans=0.0 2024-08-10 03:54:37,129 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6450, loss[loss=0.09779, beats_loss=0.01191, ecapa_loss=0.0002854, whisper_loss=0.08302, over 16021.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01235, ecapa_loss=0.0002933, whisper_loss=0.1001, over 3892705.01 frames. ], batch size: 62, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:54:41,283 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 03:54:42,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=354380.0, ans=0.125 2024-08-10 03:54:47,047 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-10 03:55:01,184 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 03:55:05,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=354580.0, ans=0.2 2024-08-10 03:55:11,131 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 03:55:11,630 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.37 vs. limit=22.5 2024-08-10 03:55:12,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=354580.0, ans=0.125 2024-08-10 03:55:14,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=354580.0, ans=10.0 2024-08-10 03:55:30,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=354680.0, ans=0.125 2024-08-10 03:55:30,567 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2024-08-10 03:55:42,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=354780.0, ans=0.0 2024-08-10 03:55:45,996 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6500, loss[loss=0.104, beats_loss=0.01048, ecapa_loss=0.0003569, whisper_loss=0.08999, over 13494.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01233, ecapa_loss=0.0002922, whisper_loss=0.1008, over 3905488.35 frames. ], batch size: 56, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:56:02,002 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.914e+01 3.314e+01 3.758e+01 6.768e+01, threshold=6.629e+01, percent-clipped=1.0 2024-08-10 03:56:03,507 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 17 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 03:56:24,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=355080.0, ans=0.125 2024-08-10 03:56:25,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=355180.0, ans=0.0 2024-08-10 03:56:28,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=355180.0, ans=0.07 2024-08-10 03:56:29,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=355180.0, ans=0.0 2024-08-10 03:56:42,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=355280.0, ans=0.1 2024-08-10 03:56:53,844 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6550, loss[loss=0.1032, beats_loss=0.01734, ecapa_loss=0.0002506, whisper_loss=0.08333, over 18320.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01235, ecapa_loss=0.0002913, whisper_loss=0.1005, over 3926131.53 frames. ], batch size: 76, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:57:00,911 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2024-08-10 03:57:23,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=355580.0, ans=0.125 2024-08-10 03:57:57,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=355780.0, ans=0.95 2024-08-10 03:58:01,560 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6600, loss[loss=0.1271, beats_loss=0.01123, ecapa_loss=0.0002848, whisper_loss=0.1131, over 22633.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01237, ecapa_loss=0.0002929, whisper_loss=0.1003, over 3967089.13 frames. ], batch size: 88, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 03:58:18,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 3.136e+01 3.510e+01 4.053e+01 6.821e+01, threshold=7.019e+01, percent-clipped=1.0 2024-08-10 03:58:21,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=355980.0, ans=0.125 2024-08-10 03:58:29,215 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 03:59:01,065 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 03:59:10,145 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6650, loss[loss=0.1458, beats_loss=0.009699, ecapa_loss=0.0002617, whisper_loss=0.1335, over 22869.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01238, ecapa_loss=0.000293, whisper_loss=0.1008, over 3971678.74 frames. ], batch size: 83, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 03:59:23,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=356480.0, ans=0.0 2024-08-10 03:59:46,263 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-10 03:59:49,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=356580.0, ans=0.1 2024-08-10 03:59:50,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=356680.0, ans=0.125 2024-08-10 03:59:59,645 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 04:00:11,151 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 04:00:19,740 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6700, loss[loss=0.118, beats_loss=0.01439, ecapa_loss=0.0002936, whisper_loss=0.1007, over 21896.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01238, ecapa_loss=0.0002926, whisper_loss=0.1006, over 3961092.39 frames. ], batch size: 92, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:00:22,584 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 04:00:26,940 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 04:00:35,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.886e+01 3.265e+01 3.693e+01 7.385e+01, threshold=6.529e+01, percent-clipped=1.0 2024-08-10 04:00:39,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356980.0, ans=0.1 2024-08-10 04:00:42,143 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 04:01:09,321 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 04:01:09,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=357180.0, ans=0.125 2024-08-10 04:01:24,905 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-10 04:01:28,577 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6750, loss[loss=0.1013, beats_loss=0.01376, ecapa_loss=0.0003378, whisper_loss=0.08415, over 18320.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01227, ecapa_loss=0.0002939, whisper_loss=0.1008, over 3931040.65 frames. ], batch size: 77, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:01:36,684 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 04:01:39,634 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 04:01:48,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=357480.0, ans=0.125 2024-08-10 04:01:52,096 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-10 04:01:57,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=357580.0, ans=0.125 2024-08-10 04:02:17,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=357680.0, ans=0.125 2024-08-10 04:02:37,237 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6800, loss[loss=0.1498, beats_loss=0.008589, ecapa_loss=0.0003102, whisper_loss=0.1381, over 16001.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01234, ecapa_loss=0.0002973, whisper_loss=0.09998, over 3919001.34 frames. ], batch size: 60, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:02:38,750 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 04:02:45,354 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2024-08-10 04:02:54,011 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.434e+01 2.970e+01 3.321e+01 3.801e+01 1.301e+02, threshold=6.643e+01, percent-clipped=3.0 2024-08-10 04:02:55,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=357980.0, ans=0.0 2024-08-10 04:03:11,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=358080.0, ans=0.125 2024-08-10 04:03:24,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=358180.0, ans=0.1 2024-08-10 04:03:28,380 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.71 vs. limit=15.0 2024-08-10 04:03:29,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=358180.0, ans=0.125 2024-08-10 04:03:40,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=358280.0, ans=0.2 2024-08-10 04:03:45,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=358380.0, ans=0.125 2024-08-10 04:03:46,618 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6850, loss[loss=0.1109, beats_loss=0.01244, ecapa_loss=0.0003339, whisper_loss=0.09513, over 17705.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01233, ecapa_loss=0.0002956, whisper_loss=0.09954, over 3903277.57 frames. ], batch size: 71, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:04:00,904 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-10 04:04:11,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=358480.0, ans=0.125 2024-08-10 04:04:11,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=12.0 2024-08-10 04:04:31,072 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2024-08-10 04:04:54,965 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6900, loss[loss=0.1097, beats_loss=0.01212, ecapa_loss=0.0002832, whisper_loss=0.09479, over 20066.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.0124, ecapa_loss=0.0002933, whisper_loss=0.09905, over 3878712.04 frames. ], batch size: 82, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:05:02,928 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 04:05:07,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=358980.0, ans=0.0 2024-08-10 04:05:10,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.982e+01 3.330e+01 3.890e+01 5.660e+01, threshold=6.660e+01, percent-clipped=0.0 2024-08-10 04:05:25,073 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.10 vs. limit=6.0 2024-08-10 04:05:46,248 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.04 vs. limit=22.5 2024-08-10 04:06:02,106 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2024-08-10 04:06:03,897 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 6950, loss[loss=0.09838, beats_loss=0.01473, ecapa_loss=0.0002591, whisper_loss=0.08105, over 17225.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01247, ecapa_loss=0.0002918, whisper_loss=0.09854, over 3883466.23 frames. ], batch size: 69, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:06:09,534 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 28 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 04:06:09,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=359380.0, ans=0.125 2024-08-10 04:06:12,205 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 23 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-10 04:06:13,681 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 04:06:14,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=359380.0, ans=0.125 2024-08-10 04:06:22,570 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-10 04:06:32,942 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.25 vs. limit=8.0 2024-08-10 04:06:33,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=359580.0, ans=0.0 2024-08-10 04:06:34,663 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 04:06:40,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=359580.0, ans=0.0 2024-08-10 04:06:51,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=359680.0, ans=0.125 2024-08-10 04:06:53,339 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=12.0 2024-08-10 04:07:10,737 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 04:07:13,280 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7000, loss[loss=0.1108, beats_loss=0.01397, ecapa_loss=0.0003251, whisper_loss=0.09356, over 20329.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01243, ecapa_loss=0.000292, whisper_loss=0.09889, over 3852726.36 frames. ], batch size: 84, lr: 1.89e-02, grad_scale: 4194304.0 2024-08-10 04:07:32,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.852e+01 3.263e+01 3.844e+01 5.295e+01, threshold=6.525e+01, percent-clipped=0.0 2024-08-10 04:07:33,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=359980.0, ans=0.0 2024-08-10 04:07:37,101 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 04:07:42,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=360080.0, ans=0.125 2024-08-10 04:07:58,997 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 04:08:07,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=360180.0, ans=0.0 2024-08-10 04:08:07,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=360180.0, ans=0.125 2024-08-10 04:08:18,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=360280.0, ans=0.2 2024-08-10 04:08:23,093 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 04:08:24,234 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7050, loss[loss=0.1209, beats_loss=0.01081, ecapa_loss=0.0002824, whisper_loss=0.1073, over 17333.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01243, ecapa_loss=0.0002927, whisper_loss=0.0988, over 3866724.68 frames. ], batch size: 70, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:08:36,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=360380.0, ans=0.0 2024-08-10 04:08:48,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=360480.0, ans=0.0 2024-08-10 04:08:50,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.42 vs. limit=15.0 2024-08-10 04:09:03,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=360580.0, ans=0.125 2024-08-10 04:09:11,544 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 04:09:20,820 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 04:09:32,631 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7100, loss[loss=0.09534, beats_loss=0.01337, ecapa_loss=0.000236, whisper_loss=0.07961, over 22045.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01242, ecapa_loss=0.0002919, whisper_loss=0.09886, over 3885234.40 frames. ], batch size: 90, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:09:32,829 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 04:09:33,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=360880.0, ans=0.125 2024-08-10 04:09:45,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=360980.0, ans=0.125 2024-08-10 04:09:48,985 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 3.064e+01 3.569e+01 4.090e+01 1.167e+02, threshold=7.137e+01, percent-clipped=2.0 2024-08-10 04:10:36,535 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-10 04:10:41,346 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7150, loss[loss=0.1093, beats_loss=0.01409, ecapa_loss=0.0002822, whisper_loss=0.09242, over 23275.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01241, ecapa_loss=0.0002903, whisper_loss=0.09896, over 3879352.27 frames. ], batch size: 94, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:10:56,899 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 04:11:04,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=361480.0, ans=0.2 2024-08-10 04:11:23,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361680.0, ans=0.1 2024-08-10 04:11:33,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=361680.0, ans=0.0 2024-08-10 04:11:40,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=361780.0, ans=0.125 2024-08-10 04:11:44,371 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 32 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 04:11:45,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=361780.0, ans=0.125 2024-08-10 04:11:50,808 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7200, loss[loss=0.1085, beats_loss=0.01377, ecapa_loss=0.0003016, whisper_loss=0.09173, over 22224.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01236, ecapa_loss=0.0002906, whisper_loss=0.09976, over 3925493.80 frames. ], batch size: 94, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:11:53,038 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=11.70 vs. limit=10.0 2024-08-10 04:12:00,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=361880.0, ans=0.0 2024-08-10 04:12:07,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.897e+01 3.291e+01 3.668e+01 6.348e+01, threshold=6.581e+01, percent-clipped=0.0 2024-08-10 04:12:11,767 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.341e+01 2024-08-10 04:12:22,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=362080.0, ans=0.125 2024-08-10 04:12:32,603 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2024-08-10 04:12:46,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=362280.0, ans=0.5 2024-08-10 04:12:54,956 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=15.0 2024-08-10 04:13:00,612 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 04:13:01,584 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7250, loss[loss=0.09019, beats_loss=0.01536, ecapa_loss=0.0003001, whisper_loss=0.07182, over 16312.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01232, ecapa_loss=0.0002913, whisper_loss=0.09945, over 3909073.60 frames. ], batch size: 71, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:13:20,287 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 19 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-10 04:13:30,973 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2024-08-10 04:13:32,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=362580.0, ans=0.125 2024-08-10 04:13:36,314 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 16 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 04:13:36,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=362580.0, ans=0.0 2024-08-10 04:13:43,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=362680.0, ans=0.1 2024-08-10 04:13:48,814 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 04:13:54,399 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 04:13:55,867 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 04:14:12,787 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7300, loss[loss=0.1103, beats_loss=0.01196, ecapa_loss=0.0002965, whisper_loss=0.09537, over 22216.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01236, ecapa_loss=0.0002907, whisper_loss=0.09922, over 3895938.83 frames. ], batch size: 89, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:14:26,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.31 vs. limit=15.0 2024-08-10 04:14:30,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.958e+01 3.364e+01 4.070e+01 6.476e+01, threshold=6.728e+01, percent-clipped=0.0 2024-08-10 04:14:53,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=363080.0, ans=0.035 2024-08-10 04:14:56,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=363180.0, ans=0.0 2024-08-10 04:15:24,181 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7350, loss[loss=0.1241, beats_loss=0.01272, ecapa_loss=0.0003002, whisper_loss=0.1083, over 22090.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01234, ecapa_loss=0.0002899, whisper_loss=0.09951, over 3915923.18 frames. ], batch size: 90, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:15:24,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=363380.0, ans=0.0 2024-08-10 04:15:39,264 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-10 04:15:48,106 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 04:15:59,378 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 04:16:05,549 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 04:16:10,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=363680.0, ans=0.02 2024-08-10 04:16:10,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=363680.0, ans=0.125 2024-08-10 04:16:15,563 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 04:16:34,382 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 04:16:37,004 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7400, loss[loss=0.1083, beats_loss=0.01191, ecapa_loss=0.0002929, whisper_loss=0.09347, over 22683.00 frames. ], tot_loss[loss=0.114, beats_loss=0.0123, ecapa_loss=0.000291, whisper_loss=0.09884, over 3863635.48 frames. ], batch size: 91, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:16:40,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=363880.0, ans=0.0 2024-08-10 04:16:41,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=363880.0, ans=0.125 2024-08-10 04:16:53,250 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-10 04:16:54,651 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 3.042e+01 3.418e+01 4.034e+01 8.204e+01, threshold=6.837e+01, percent-clipped=2.0 2024-08-10 04:16:56,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=363980.0, ans=0.125 2024-08-10 04:17:18,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=364080.0, ans=10.0 2024-08-10 04:17:20,317 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 04:17:26,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=364180.0, ans=0.1 2024-08-10 04:17:33,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=364280.0, ans=0.125 2024-08-10 04:17:40,811 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 04:17:49,386 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7450, loss[loss=0.1044, beats_loss=0.01538, ecapa_loss=0.0002569, whisper_loss=0.08645, over 21403.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01241, ecapa_loss=0.0002934, whisper_loss=0.09777, over 3862544.24 frames. ], batch size: 87, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:17:51,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=364380.0, ans=0.125 2024-08-10 04:17:53,544 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 04:17:55,030 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 04:17:58,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=364380.0, ans=0.125 2024-08-10 04:18:23,992 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 04:19:03,255 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7500, loss[loss=0.1307, beats_loss=0.01032, ecapa_loss=0.0003296, whisper_loss=0.1171, over 22056.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01231, ecapa_loss=0.000294, whisper_loss=0.09795, over 3852012.62 frames. ], batch size: 90, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:19:03,422 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 04:19:04,858 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 04:19:20,566 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.956e+01 3.355e+01 3.815e+01 8.528e+01, threshold=6.709e+01, percent-clipped=1.0 2024-08-10 04:19:27,623 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.33 vs. limit=22.5 2024-08-10 04:19:30,100 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 04:19:33,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=365080.0, ans=0.125 2024-08-10 04:19:34,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=365080.0, ans=0.1 2024-08-10 04:19:41,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365080.0, ans=0.1 2024-08-10 04:19:54,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=365180.0, ans=0.125 2024-08-10 04:19:54,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=365180.0, ans=0.125 2024-08-10 04:20:01,086 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.09 vs. limit=10.0 2024-08-10 04:20:16,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=365380.0, ans=0.125 2024-08-10 04:20:16,894 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7550, loss[loss=0.1006, beats_loss=0.01388, ecapa_loss=0.0003041, whisper_loss=0.08368, over 18819.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01219, ecapa_loss=0.0002923, whisper_loss=0.09957, over 3884209.60 frames. ], batch size: 80, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:20:17,902 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=22.5 2024-08-10 04:20:18,819 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 04:20:31,976 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 04:20:37,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=365480.0, ans=0.2 2024-08-10 04:20:42,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=365480.0, ans=0.2 2024-08-10 04:21:30,132 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7600, loss[loss=0.1219, beats_loss=0.01109, ecapa_loss=0.0002714, whisper_loss=0.1081, over 14897.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01225, ecapa_loss=0.0002934, whisper_loss=0.09962, over 3857015.07 frames. ], batch size: 54, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:21:34,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=365880.0, ans=0.0 2024-08-10 04:21:41,482 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 04:21:44,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=365980.0, ans=0.125 2024-08-10 04:21:46,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+01 3.083e+01 3.503e+01 3.988e+01 6.295e+01, threshold=7.005e+01, percent-clipped=0.0 2024-08-10 04:21:58,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=366080.0, ans=0.05 2024-08-10 04:22:16,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=366180.0, ans=0.125 2024-08-10 04:22:16,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=366180.0, ans=0.125 2024-08-10 04:22:42,559 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7650, loss[loss=0.09845, beats_loss=0.01299, ecapa_loss=0.0002868, whisper_loss=0.08259, over 22298.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01237, ecapa_loss=0.0002921, whisper_loss=0.09928, over 3865766.90 frames. ], batch size: 88, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:22:46,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=366380.0, ans=0.125 2024-08-10 04:22:52,529 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.86 vs. limit=15.0 2024-08-10 04:22:58,967 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 04:22:59,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=366480.0, ans=0.125 2024-08-10 04:23:00,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=366480.0, ans=0.95 2024-08-10 04:23:22,903 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2024-08-10 04:23:34,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=366680.0, ans=0.125 2024-08-10 04:23:39,238 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.83 vs. limit=22.5 2024-08-10 04:23:40,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=12.0 2024-08-10 04:23:46,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=366780.0, ans=0.125 2024-08-10 04:23:47,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=366780.0, ans=0.125 2024-08-10 04:23:51,733 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 04:23:54,081 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7700, loss[loss=0.08758, beats_loss=0.01454, ecapa_loss=0.0003506, whisper_loss=0.06954, over 14224.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01234, ecapa_loss=0.000292, whisper_loss=0.09923, over 3872627.29 frames. ], batch size: 60, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:23:54,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=366880.0, ans=0.2 2024-08-10 04:24:12,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.962e+01 3.373e+01 3.972e+01 7.552e+01, threshold=6.745e+01, percent-clipped=1.0 2024-08-10 04:24:22,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=367080.0, ans=0.125 2024-08-10 04:24:26,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=367080.0, ans=0.125 2024-08-10 04:24:29,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=367080.0, ans=0.0 2024-08-10 04:24:35,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=367080.0, ans=0.125 2024-08-10 04:24:37,034 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 04:24:46,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=367180.0, ans=0.0 2024-08-10 04:24:51,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=367280.0, ans=0.125 2024-08-10 04:24:51,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367280.0, ans=0.1 2024-08-10 04:24:52,804 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-10 04:24:58,068 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 04:25:01,006 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 04:25:06,925 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7750, loss[loss=0.1188, beats_loss=0.01223, ecapa_loss=0.0002846, whisper_loss=0.1037, over 20338.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01234, ecapa_loss=0.0002927, whisper_loss=0.09889, over 3848201.64 frames. ], batch size: 81, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:25:19,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=367380.0, ans=0.0 2024-08-10 04:25:28,490 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 04:25:51,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=367680.0, ans=0.125 2024-08-10 04:25:54,317 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 31 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 04:26:03,952 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 04:26:08,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=367780.0, ans=0.025 2024-08-10 04:26:11,372 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 04:26:15,157 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.77 vs. limit=5.0 2024-08-10 04:26:16,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=367780.0, ans=0.0 2024-08-10 04:26:18,456 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7800, loss[loss=0.1001, beats_loss=0.0149, ecapa_loss=0.0002695, whisper_loss=0.08254, over 14202.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01237, ecapa_loss=0.0002884, whisper_loss=0.09878, over 3912300.71 frames. ], batch size: 58, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:26:19,577 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.58 vs. limit=15.0 2024-08-10 04:26:20,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=367880.0, ans=0.125 2024-08-10 04:26:27,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=367880.0, ans=0.2 2024-08-10 04:26:34,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.424e+01 3.081e+01 3.363e+01 3.893e+01 6.913e+01, threshold=6.726e+01, percent-clipped=1.0 2024-08-10 04:26:35,628 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2024-08-10 04:26:53,360 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-10 04:26:59,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=368180.0, ans=0.125 2024-08-10 04:27:06,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=368180.0, ans=0.125 2024-08-10 04:27:23,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=368280.0, ans=0.0 2024-08-10 04:27:26,301 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2024-08-10 04:27:28,225 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7850, loss[loss=0.1018, beats_loss=0.009222, ecapa_loss=0.0003644, whisper_loss=0.08893, over 14416.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01232, ecapa_loss=0.0002886, whisper_loss=0.09956, over 3935021.87 frames. ], batch size: 57, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:27:29,138 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.74 vs. limit=22.5 2024-08-10 04:27:32,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=368380.0, ans=0.0 2024-08-10 04:27:47,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=368480.0, ans=0.0 2024-08-10 04:28:01,039 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 04:28:14,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=368680.0, ans=0.125 2024-08-10 04:28:24,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=368780.0, ans=0.125 2024-08-10 04:28:27,567 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 04:28:27,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=368780.0, ans=0.0 2024-08-10 04:28:30,152 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 04:28:38,443 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7900, loss[loss=0.09331, beats_loss=0.01661, ecapa_loss=0.0002178, whisper_loss=0.07452, over 23451.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01238, ecapa_loss=0.0002858, whisper_loss=0.0992, over 3933414.26 frames. ], batch size: 93, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:28:40,692 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.24 vs. limit=6.0 2024-08-10 04:28:51,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=368980.0, ans=0.0 2024-08-10 04:28:53,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=368980.0, ans=0.125 2024-08-10 04:28:54,643 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+01 2.975e+01 3.379e+01 4.027e+01 6.816e+01, threshold=6.758e+01, percent-clipped=1.0 2024-08-10 04:28:54,982 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 04:28:55,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=368980.0, ans=0.1 2024-08-10 04:28:56,445 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 04:29:09,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=369080.0, ans=0.09899494936611666 2024-08-10 04:29:16,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=369080.0, ans=0.125 2024-08-10 04:29:29,487 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2024-08-10 04:29:33,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=369280.0, ans=0.0 2024-08-10 04:29:38,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=369280.0, ans=0.2 2024-08-10 04:29:47,877 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 7950, loss[loss=0.1267, beats_loss=0.009299, ecapa_loss=0.000265, whisper_loss=0.1147, over 16947.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01238, ecapa_loss=0.0002841, whisper_loss=0.09979, over 3921415.15 frames. ], batch size: 64, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:29:52,613 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 28 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-10 04:29:56,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=369380.0, ans=0.125 2024-08-10 04:30:04,589 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 04:30:07,214 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 28 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 04:30:19,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=369580.0, ans=0.2 2024-08-10 04:30:20,914 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 30 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 04:30:38,216 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-08-10 04:30:55,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=369880.0, ans=0.125 2024-08-10 04:30:56,749 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8000, loss[loss=0.1056, beats_loss=0.01097, ecapa_loss=0.0002935, whisper_loss=0.09166, over 15841.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01235, ecapa_loss=0.0002838, whisper_loss=0.1005, over 3890982.10 frames. ], batch size: 63, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:30:57,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=369880.0, ans=0.125 2024-08-10 04:31:06,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=369880.0, ans=0.125 2024-08-10 04:31:13,355 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 3.026e+01 3.341e+01 3.954e+01 6.055e+01, threshold=6.681e+01, percent-clipped=0.0 2024-08-10 04:31:37,529 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.75 vs. limit=15.0 2024-08-10 04:31:52,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=370280.0, ans=0.0 2024-08-10 04:31:56,028 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 29 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 04:32:05,341 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8050, loss[loss=0.1133, beats_loss=0.01164, ecapa_loss=0.000256, whisper_loss=0.09907, over 15107.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01229, ecapa_loss=0.0002849, whisper_loss=0.1004, over 3860288.03 frames. ], batch size: 57, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:32:10,188 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.07 vs. limit=12.0 2024-08-10 04:32:11,025 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 04:32:36,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=370580.0, ans=0.1 2024-08-10 04:32:41,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=370580.0, ans=0.0 2024-08-10 04:32:42,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=370580.0, ans=0.0 2024-08-10 04:32:43,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=370580.0, ans=0.125 2024-08-10 04:32:51,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=370680.0, ans=0.125 2024-08-10 04:32:57,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=370680.0, ans=0.125 2024-08-10 04:32:57,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=370680.0, ans=0.125 2024-08-10 04:33:01,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=370780.0, ans=0.125 2024-08-10 04:33:14,965 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8100, loss[loss=0.1105, beats_loss=0.01295, ecapa_loss=0.0002883, whisper_loss=0.09465, over 21847.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01225, ecapa_loss=0.0002861, whisper_loss=0.1006, over 3870601.17 frames. ], batch size: 89, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:33:24,719 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.53 vs. limit=22.5 2024-08-10 04:33:31,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.921e+01 3.268e+01 3.818e+01 1.425e+02, threshold=6.536e+01, percent-clipped=1.0 2024-08-10 04:34:23,709 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8150, loss[loss=0.1245, beats_loss=0.01084, ecapa_loss=0.0002499, whisper_loss=0.1111, over 16370.00 frames. ], tot_loss[loss=0.115, beats_loss=0.0122, ecapa_loss=0.0002877, whisper_loss=0.09997, over 3837209.20 frames. ], batch size: 62, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:34:27,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=371380.0, ans=0.125 2024-08-10 04:34:32,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=371380.0, ans=0.125 2024-08-10 04:34:39,259 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-10 04:34:54,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=371580.0, ans=0.125 2024-08-10 04:35:02,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=371580.0, ans=0.2 2024-08-10 04:35:08,853 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-10 04:35:11,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371680.0, ans=0.1 2024-08-10 04:35:18,211 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 04:35:25,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=371780.0, ans=0.125 2024-08-10 04:35:31,839 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8200, loss[loss=0.09451, beats_loss=0.01122, ecapa_loss=0.0003532, whisper_loss=0.07976, over 14839.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01229, ecapa_loss=0.0002876, whisper_loss=0.0991, over 3847768.33 frames. ], batch size: 59, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:35:34,920 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 04:35:46,521 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2024-08-10 04:35:48,832 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.405e+01 2.993e+01 3.348e+01 3.834e+01 8.342e+01, threshold=6.697e+01, percent-clipped=3.0 2024-08-10 04:36:10,325 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=12.0 2024-08-10 04:36:18,318 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=8.180e-02 2024-08-10 04:36:19,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=372180.0, ans=10.0 2024-08-10 04:36:23,942 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 04:36:28,226 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 04:36:42,579 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8250, loss[loss=0.1165, beats_loss=0.01234, ecapa_loss=0.0003232, whisper_loss=0.1009, over 20026.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01234, ecapa_loss=0.000287, whisper_loss=0.09921, over 3862622.79 frames. ], batch size: 84, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:36:46,866 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 04:36:56,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=372480.0, ans=0.2 2024-08-10 04:36:57,459 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 20 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-10 04:36:59,326 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.97 vs. limit=22.5 2024-08-10 04:37:01,461 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 04:37:04,344 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 04:37:20,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=372580.0, ans=0.125 2024-08-10 04:37:21,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=372580.0, ans=0.035 2024-08-10 04:37:32,595 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-10 04:37:50,262 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=22.5 2024-08-10 04:37:54,845 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8300, loss[loss=0.105, beats_loss=0.01224, ecapa_loss=0.0002753, whisper_loss=0.09, over 21837.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01243, ecapa_loss=0.0002858, whisper_loss=0.09913, over 3864947.04 frames. ], batch size: 89, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:38:01,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=372880.0, ans=0.125 2024-08-10 04:38:05,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=372880.0, ans=0.0 2024-08-10 04:38:09,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=372980.0, ans=0.125 2024-08-10 04:38:09,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=372980.0, ans=0.5 2024-08-10 04:38:12,517 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.364e+01 3.111e+01 3.544e+01 4.051e+01 1.362e+02, threshold=7.088e+01, percent-clipped=2.0 2024-08-10 04:38:21,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=372980.0, ans=0.125 2024-08-10 04:38:27,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=373080.0, ans=0.5 2024-08-10 04:38:30,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=373080.0, ans=0.125 2024-08-10 04:38:56,399 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.91 vs. limit=15.0 2024-08-10 04:39:07,705 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8350, loss[loss=0.1057, beats_loss=0.008821, ecapa_loss=0.0003514, whisper_loss=0.09334, over 15535.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01237, ecapa_loss=0.0002887, whisper_loss=0.0997, over 3893754.95 frames. ], batch size: 64, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:39:10,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=373380.0, ans=0.125 2024-08-10 04:39:24,060 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 04:39:32,676 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-10 04:39:39,685 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 04:39:42,732 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-10 04:39:45,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=373580.0, ans=0.0 2024-08-10 04:39:45,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=373580.0, ans=0.125 2024-08-10 04:39:47,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=373580.0, ans=0.125 2024-08-10 04:39:54,170 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.01 vs. limit=15.0 2024-08-10 04:40:06,647 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2024-08-10 04:40:14,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=373780.0, ans=0.125 2024-08-10 04:40:26,172 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8400, loss[loss=0.1287, beats_loss=0.01104, ecapa_loss=0.0003053, whisper_loss=0.1146, over 20785.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01236, ecapa_loss=0.0002903, whisper_loss=0.09915, over 3867220.28 frames. ], batch size: 81, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:40:44,563 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 04:40:48,135 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 2.937e+01 3.360e+01 3.795e+01 5.469e+01, threshold=6.721e+01, percent-clipped=0.0 2024-08-10 04:41:00,380 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.14 vs. limit=15.0 2024-08-10 04:41:17,702 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-10 04:41:31,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=374180.0, ans=0.05 2024-08-10 04:41:33,951 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 21 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-10 04:41:34,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=374180.0, ans=0.125 2024-08-10 04:41:36,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=374180.0, ans=0.125 2024-08-10 04:41:41,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=374280.0, ans=0.125 2024-08-10 04:41:56,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=374380.0, ans=0.0 2024-08-10 04:41:57,153 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8450, loss[loss=0.0956, beats_loss=0.01355, ecapa_loss=0.0003242, whisper_loss=0.07881, over 18305.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01226, ecapa_loss=0.0002914, whisper_loss=0.09948, over 3877113.01 frames. ], batch size: 77, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:42:03,446 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 04:42:08,419 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 04:42:20,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=374480.0, ans=0.125 2024-08-10 04:42:28,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=374480.0, ans=0.125 2024-08-10 04:42:31,926 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 04:42:33,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=374580.0, ans=0.125 2024-08-10 04:42:40,602 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 04:42:40,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=374580.0, ans=0.125 2024-08-10 04:42:47,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=374580.0, ans=0.125 2024-08-10 04:42:48,113 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.95 vs. limit=15.0 2024-08-10 04:42:58,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=374680.0, ans=0.0 2024-08-10 04:43:08,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=374780.0, ans=0.0 2024-08-10 04:43:15,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=374780.0, ans=0.0 2024-08-10 04:43:27,383 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8500, loss[loss=0.09889, beats_loss=0.0122, ecapa_loss=0.0002592, whisper_loss=0.0841, over 22698.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01213, ecapa_loss=0.0002917, whisper_loss=0.1007, over 3913798.08 frames. ], batch size: 92, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:43:40,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=374880.0, ans=0.125 2024-08-10 04:43:48,293 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 3.067e+01 3.351e+01 3.844e+01 5.655e+01, threshold=6.702e+01, percent-clipped=0.0 2024-08-10 04:44:01,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375080.0, ans=0.1 2024-08-10 04:44:20,638 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.91 vs. limit=15.0 2024-08-10 04:44:42,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=375280.0, ans=0.125 2024-08-10 04:44:46,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=375280.0, ans=0.0 2024-08-10 04:44:54,617 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8550, loss[loss=0.1237, beats_loss=0.01235, ecapa_loss=0.0002784, whisper_loss=0.1085, over 21983.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01221, ecapa_loss=0.0002909, whisper_loss=0.09998, over 3900818.03 frames. ], batch size: 88, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:45:12,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375480.0, ans=0.1 2024-08-10 04:45:15,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375480.0, ans=0.1 2024-08-10 04:45:35,928 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2024-08-10 04:45:37,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=375580.0, ans=0.125 2024-08-10 04:45:38,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=375580.0, ans=0.125 2024-08-10 04:45:55,998 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2024-08-10 04:46:04,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=375780.0, ans=0.125 2024-08-10 04:46:11,354 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8600, loss[loss=0.1211, beats_loss=0.01095, ecapa_loss=0.0003061, whisper_loss=0.1071, over 19985.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01224, ecapa_loss=0.0002898, whisper_loss=0.09967, over 3866342.81 frames. ], batch size: 79, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:46:14,487 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 04:46:16,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=375880.0, ans=0.2 2024-08-10 04:46:19,497 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 04:46:20,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.11 vs. limit=10.0 2024-08-10 04:46:27,836 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 3.077e+01 3.509e+01 3.969e+01 6.307e+01, threshold=7.019e+01, percent-clipped=0.0 2024-08-10 04:46:28,907 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.96 vs. limit=15.0 2024-08-10 04:46:35,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=375980.0, ans=0.025 2024-08-10 04:46:35,657 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.13 vs. limit=6.0 2024-08-10 04:46:39,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=376080.0, ans=0.125 2024-08-10 04:46:46,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=376080.0, ans=0.2 2024-08-10 04:46:57,606 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 04:47:21,465 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8650, loss[loss=0.1017, beats_loss=0.01305, ecapa_loss=0.0003365, whisper_loss=0.08529, over 21367.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01225, ecapa_loss=0.0002891, whisper_loss=0.09968, over 3877490.14 frames. ], batch size: 92, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:47:30,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=376380.0, ans=0.05 2024-08-10 04:47:33,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=376380.0, ans=0.125 2024-08-10 04:47:44,139 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 34 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 04:48:02,667 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-10 04:48:06,477 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 04:48:08,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=376680.0, ans=0.0 2024-08-10 04:48:10,355 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2024-08-10 04:48:13,652 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 04:48:21,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-08-10 04:48:31,427 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8700, loss[loss=0.09677, beats_loss=0.01271, ecapa_loss=0.0003061, whisper_loss=0.081, over 19598.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01226, ecapa_loss=0.0002894, whisper_loss=0.09937, over 3869859.81 frames. ], batch size: 81, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:48:32,664 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2024-08-10 04:48:47,793 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.537e+01 3.015e+01 3.371e+01 3.912e+01 6.380e+01, threshold=6.741e+01, percent-clipped=0.0 2024-08-10 04:48:50,737 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-10 04:48:53,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=376980.0, ans=0.015 2024-08-10 04:48:59,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=377080.0, ans=0.2 2024-08-10 04:49:25,272 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 04:49:34,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=377280.0, ans=0.125 2024-08-10 04:49:38,765 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-10 04:49:39,862 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8750, loss[loss=0.1027, beats_loss=0.01506, ecapa_loss=0.0002349, whisper_loss=0.08531, over 16229.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01228, ecapa_loss=0.0002904, whisper_loss=0.09965, over 3869301.61 frames. ], batch size: 66, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:49:40,186 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 04:49:41,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=377380.0, ans=0.0 2024-08-10 04:49:51,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=377380.0, ans=0.125 2024-08-10 04:49:52,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=377480.0, ans=0.0 2024-08-10 04:50:05,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377580.0, ans=0.1 2024-08-10 04:50:16,295 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2024-08-10 04:50:47,877 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8800, loss[loss=0.116, beats_loss=0.01164, ecapa_loss=0.0002708, whisper_loss=0.1016, over 22158.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01235, ecapa_loss=0.0002883, whisper_loss=0.0998, over 3884073.14 frames. ], batch size: 88, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:50:55,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=377880.0, ans=0.125 2024-08-10 04:50:58,381 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.53 vs. limit=15.0 2024-08-10 04:51:00,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=377980.0, ans=0.125 2024-08-10 04:51:04,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.359e+01 3.131e+01 3.473e+01 4.096e+01 6.875e+01, threshold=6.946e+01, percent-clipped=1.0 2024-08-10 04:51:08,514 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 04:51:11,509 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 04:51:13,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=377980.0, ans=0.0 2024-08-10 04:51:28,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=378180.0, ans=0.0 2024-08-10 04:51:35,251 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 04:51:55,237 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 04:51:57,645 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8850, loss[loss=0.119, beats_loss=0.01111, ecapa_loss=0.0002937, whisper_loss=0.1049, over 23434.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01237, ecapa_loss=0.0002877, whisper_loss=0.0994, over 3902677.45 frames. ], batch size: 93, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:52:39,325 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 04:52:56,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=378780.0, ans=0.125 2024-08-10 04:52:59,096 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 04:53:05,791 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8900, loss[loss=0.1131, beats_loss=0.01269, ecapa_loss=0.0002352, whisper_loss=0.09808, over 24308.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.0123, ecapa_loss=0.0002871, whisper_loss=0.09931, over 3893330.19 frames. ], batch size: 92, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:53:12,766 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 04:53:22,395 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.485e+01 3.017e+01 3.379e+01 3.848e+01 7.752e+01, threshold=6.759e+01, percent-clipped=1.0 2024-08-10 04:53:37,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=379080.0, ans=0.125 2024-08-10 04:53:43,129 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-10 04:53:56,802 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-10 04:54:06,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=379280.0, ans=0.125 2024-08-10 04:54:06,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=379280.0, ans=0.125 2024-08-10 04:54:12,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=379280.0, ans=0.0 2024-08-10 04:54:14,282 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 8950, loss[loss=0.1281, beats_loss=0.009967, ecapa_loss=0.0003074, whisper_loss=0.115, over 22929.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01233, ecapa_loss=0.0002883, whisper_loss=0.09934, over 3908057.51 frames. ], batch size: 92, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:54:14,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=379380.0, ans=0.0 2024-08-10 04:54:18,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=379380.0, ans=0.125 2024-08-10 04:54:27,892 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 04:54:31,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=379480.0, ans=0.125 2024-08-10 04:55:13,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.08 vs. limit=22.5 2024-08-10 04:55:22,697 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9000, loss[loss=0.09888, beats_loss=0.0135, ecapa_loss=0.0003166, whisper_loss=0.08222, over 19421.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.0124, ecapa_loss=0.0002904, whisper_loss=0.09847, over 3897150.65 frames. ], batch size: 81, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:55:22,698 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 04:56:01,316 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on ASR_libri: loss=0.2773, beats_loss=0, ecapa_loss=0.0008691, whisper_loss=0.2686, over 922467.00 frames. 2024-08-10 04:56:19,181 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on SV_voxceleb1: loss=0.007577, beats_loss=0, ecapa_loss=0.0007577, whisper_loss=0, over 939242.00 frames. 2024-08-10 04:56:58,720 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.1799, 3.7244, 4.0332, 3.9126], device='cuda:1') 2024-08-10 04:58:16,639 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on AT_audioset: loss=0.02874, beats_loss=0.02874, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 04:58:16,642 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 04:58:18,326 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 04:58:21,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=379880.0, ans=0.125 2024-08-10 04:58:23,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=379880.0, ans=0.0 2024-08-10 04:58:33,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 3.022e+01 3.372e+01 4.052e+01 6.376e+01, threshold=6.745e+01, percent-clipped=0.0 2024-08-10 04:58:45,939 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 41 from LS+wenet, 30 from Vox, 18 fro AS 2024-08-10 04:59:11,197 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 04:59:13,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=1.98 vs. limit=15.0 2024-08-10 04:59:17,873 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 04:59:19,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=380280.0, ans=0.0 2024-08-10 04:59:25,796 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9050, loss[loss=0.1246, beats_loss=0.01182, ecapa_loss=0.000227, whisper_loss=0.1105, over 23627.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01231, ecapa_loss=0.0002903, whisper_loss=0.0993, over 3855118.57 frames. ], batch size: 89, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 04:59:38,190 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 04:59:42,604 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 04:59:42,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=380480.0, ans=0.0 2024-08-10 04:59:49,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=380480.0, ans=0.1 2024-08-10 04:59:59,987 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.44 vs. limit=6.0 2024-08-10 05:00:02,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.63 vs. limit=6.0 2024-08-10 05:00:05,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=380580.0, ans=0.0 2024-08-10 05:00:07,247 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 14 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-10 05:00:11,319 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 05:00:15,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380680.0, ans=0.1 2024-08-10 05:00:18,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=380680.0, ans=0.0 2024-08-10 05:00:22,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=380780.0, ans=0.0 2024-08-10 05:00:26,698 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 05:00:34,541 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9100, loss[loss=0.1235, beats_loss=0.01093, ecapa_loss=0.0003631, whisper_loss=0.1089, over 21931.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.0123, ecapa_loss=0.0002908, whisper_loss=0.09894, over 3813500.29 frames. ], batch size: 93, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:00:37,552 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 05:00:37,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=380880.0, ans=0.125 2024-08-10 05:00:50,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=380980.0, ans=0.0 2024-08-10 05:00:51,277 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.151e+01 2.817e+01 3.235e+01 3.647e+01 7.816e+01, threshold=6.470e+01, percent-clipped=1.0 2024-08-10 05:01:09,890 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2024-08-10 05:01:13,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=381080.0, ans=0.125 2024-08-10 05:01:16,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=381180.0, ans=0.0 2024-08-10 05:01:21,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=381180.0, ans=0.125 2024-08-10 05:01:31,257 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-10 05:01:31,689 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.79 vs. limit=22.5 2024-08-10 05:01:33,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=381280.0, ans=0.125 2024-08-10 05:01:34,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=381280.0, ans=0.1 2024-08-10 05:01:37,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=381280.0, ans=0.125 2024-08-10 05:01:43,586 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9150, loss[loss=0.1414, beats_loss=0.00999, ecapa_loss=0.0002253, whisper_loss=0.1292, over 20709.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01233, ecapa_loss=0.0002874, whisper_loss=0.0989, over 3835113.78 frames. ], batch size: 77, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:01:48,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=381380.0, ans=0.0 2024-08-10 05:01:50,675 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 05:01:52,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=381380.0, ans=0.125 2024-08-10 05:01:57,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=381480.0, ans=0.0 2024-08-10 05:02:07,577 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 05:02:22,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=381580.0, ans=0.2 2024-08-10 05:02:22,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=381580.0, ans=0.07 2024-08-10 05:02:29,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=381680.0, ans=0.1 2024-08-10 05:02:45,879 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-10 05:02:52,633 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9200, loss[loss=0.1308, beats_loss=0.01116, ecapa_loss=0.000334, whisper_loss=0.1163, over 17671.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01239, ecapa_loss=0.000289, whisper_loss=0.09879, over 3844292.30 frames. ], batch size: 71, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:03:08,920 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 3.038e+01 3.317e+01 3.849e+01 8.293e+01, threshold=6.633e+01, percent-clipped=1.0 2024-08-10 05:03:30,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=382080.0, ans=0.125 2024-08-10 05:03:36,499 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 05:03:40,703 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 05:03:50,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=382280.0, ans=0.05 2024-08-10 05:03:58,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=382280.0, ans=0.0 2024-08-10 05:04:00,836 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9250, loss[loss=0.111, beats_loss=0.009684, ecapa_loss=0.000338, whisper_loss=0.09797, over 17486.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01246, ecapa_loss=0.0002899, whisper_loss=0.09747, over 3886708.11 frames. ], batch size: 71, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:04:01,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=382380.0, ans=0.0 2024-08-10 05:04:04,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=382380.0, ans=0.1 2024-08-10 05:04:15,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=382480.0, ans=0.125 2024-08-10 05:04:16,413 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 05:04:18,413 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.52 vs. limit=22.5 2024-08-10 05:04:19,990 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-08-10 05:04:45,114 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 05:05:05,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=382780.0, ans=0.125 2024-08-10 05:05:09,751 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9300, loss[loss=0.09392, beats_loss=0.01254, ecapa_loss=0.0002616, whisper_loss=0.07877, over 18494.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01241, ecapa_loss=0.0002905, whisper_loss=0.09791, over 3911368.67 frames. ], batch size: 76, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:05:16,934 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-10 05:05:22,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=382980.0, ans=0.1 2024-08-10 05:05:26,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 3.054e+01 3.480e+01 4.164e+01 1.138e+02, threshold=6.960e+01, percent-clipped=2.0 2024-08-10 05:05:28,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=382980.0, ans=15.0 2024-08-10 05:05:42,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=383080.0, ans=0.125 2024-08-10 05:05:47,925 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.26 vs. limit=6.0 2024-08-10 05:05:48,067 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=12.0 2024-08-10 05:05:50,036 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 05:05:53,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=383180.0, ans=0.125 2024-08-10 05:06:18,836 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9350, loss[loss=0.09539, beats_loss=0.01512, ecapa_loss=0.0002146, whisper_loss=0.07812, over 22680.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01237, ecapa_loss=0.0002907, whisper_loss=0.09799, over 3886973.24 frames. ], batch size: 91, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:06:20,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=383380.0, ans=0.125 2024-08-10 05:06:40,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=383480.0, ans=0.125 2024-08-10 05:06:48,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=383580.0, ans=0.025 2024-08-10 05:07:08,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=383680.0, ans=0.125 2024-08-10 05:07:11,046 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-10 05:07:29,249 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9400, loss[loss=0.1112, beats_loss=0.0137, ecapa_loss=0.0002647, whisper_loss=0.09488, over 22014.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01231, ecapa_loss=0.0002904, whisper_loss=0.09855, over 3890763.01 frames. ], batch size: 90, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:07:35,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=383880.0, ans=0.125 2024-08-10 05:07:36,438 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 05:07:44,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=383980.0, ans=0.0 2024-08-10 05:07:45,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.317e+01 2.835e+01 3.411e+01 3.975e+01 7.515e+01, threshold=6.823e+01, percent-clipped=1.0 2024-08-10 05:07:58,010 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 05:07:59,331 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 05:08:25,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=384280.0, ans=0.1 2024-08-10 05:08:27,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=384280.0, ans=15.0 2024-08-10 05:08:31,022 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 12 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-10 05:08:37,811 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9450, loss[loss=0.1399, beats_loss=0.01017, ecapa_loss=0.0002656, whisper_loss=0.1271, over 23054.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01228, ecapa_loss=0.0002914, whisper_loss=0.09867, over 3873957.99 frames. ], batch size: 86, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:09:08,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=384580.0, ans=0.125 2024-08-10 05:09:14,638 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 05:09:21,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=384680.0, ans=0.0 2024-08-10 05:09:29,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=384680.0, ans=0.04949747468305833 2024-08-10 05:09:30,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=384680.0, ans=0.1 2024-08-10 05:09:46,517 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9500, loss[loss=0.1061, beats_loss=0.01361, ecapa_loss=0.0002864, whisper_loss=0.08961, over 20870.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01234, ecapa_loss=0.0002902, whisper_loss=0.09873, over 3873650.60 frames. ], batch size: 85, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:09:49,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=384880.0, ans=0.0 2024-08-10 05:10:03,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.474e+01 3.035e+01 3.445e+01 3.941e+01 9.468e+01, threshold=6.890e+01, percent-clipped=2.0 2024-08-10 05:10:09,757 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.61 vs. limit=10.0 2024-08-10 05:10:17,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=385080.0, ans=0.2 2024-08-10 05:10:31,704 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.26 vs. limit=10.0 2024-08-10 05:10:38,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=385180.0, ans=0.125 2024-08-10 05:10:55,331 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9550, loss[loss=0.1387, beats_loss=0.009673, ecapa_loss=0.0003191, whisper_loss=0.1259, over 22705.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01234, ecapa_loss=0.000288, whisper_loss=0.09824, over 3895804.04 frames. ], batch size: 90, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:11:00,617 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=10.0 2024-08-10 05:11:06,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=385380.0, ans=0.035 2024-08-10 05:11:10,606 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 18 from LS+wenet, 32 from Vox, 40 fro AS 2024-08-10 05:11:11,142 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=12.0 2024-08-10 05:11:16,756 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2024-08-10 05:11:21,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=385580.0, ans=0.2 2024-08-10 05:11:22,614 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 15 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 05:11:29,483 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-10 05:11:31,248 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=15.0 2024-08-10 05:11:46,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=385680.0, ans=0.125 2024-08-10 05:12:04,868 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9600, loss[loss=0.08422, beats_loss=0.01581, ecapa_loss=0.0003395, whisper_loss=0.06502, over 16933.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01237, ecapa_loss=0.0002915, whisper_loss=0.09837, over 3873017.08 frames. ], batch size: 70, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:12:16,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=385880.0, ans=0.2 2024-08-10 05:12:17,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=385980.0, ans=0.07 2024-08-10 05:12:21,322 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 3.010e+01 3.489e+01 4.021e+01 7.106e+01, threshold=6.979e+01, percent-clipped=1.0 2024-08-10 05:12:21,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=385980.0, ans=0.0 2024-08-10 05:12:23,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=385980.0, ans=0.2 2024-08-10 05:12:28,567 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 05:12:28,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=385980.0, ans=0.125 2024-08-10 05:12:39,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.90 vs. limit=22.5 2024-08-10 05:12:49,988 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 05:12:51,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=386180.0, ans=0.2 2024-08-10 05:12:58,393 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 05:13:01,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=386280.0, ans=0.5 2024-08-10 05:13:11,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=386280.0, ans=0.07 2024-08-10 05:13:14,598 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9650, loss[loss=0.1022, beats_loss=0.01344, ecapa_loss=0.0003006, whisper_loss=0.0858, over 19655.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01221, ecapa_loss=0.0002922, whisper_loss=0.09867, over 3836711.68 frames. ], batch size: 82, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:13:16,826 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.15 vs. limit=10.0 2024-08-10 05:13:20,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=386380.0, ans=0.125 2024-08-10 05:13:54,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=386580.0, ans=0.0 2024-08-10 05:13:56,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=12.0 2024-08-10 05:14:05,113 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 05:14:11,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=386780.0, ans=0.125 2024-08-10 05:14:19,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=386780.0, ans=0.0 2024-08-10 05:14:24,542 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9700, loss[loss=0.1411, beats_loss=0.008083, ecapa_loss=0.0003784, whisper_loss=0.1292, over 22222.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01231, ecapa_loss=0.0002935, whisper_loss=0.09784, over 3834422.50 frames. ], batch size: 88, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:14:33,672 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=12.0 2024-08-10 05:14:36,958 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 11 from Vox, 44 fro AS 2024-08-10 05:14:37,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=386980.0, ans=0.05 2024-08-10 05:14:40,627 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.854e+01 3.317e+01 3.898e+01 6.731e+01, threshold=6.635e+01, percent-clipped=0.0 2024-08-10 05:14:46,660 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 05:15:01,989 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-10 05:15:25,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=387280.0, ans=0.125 2024-08-10 05:15:29,042 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.00 vs. limit=10.0 2024-08-10 05:15:33,958 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9750, loss[loss=0.08781, beats_loss=0.0131, ecapa_loss=0.0002971, whisper_loss=0.07174, over 19509.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01238, ecapa_loss=0.0002909, whisper_loss=0.09748, over 3873146.43 frames. ], batch size: 80, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:15:41,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=387380.0, ans=0.1 2024-08-10 05:15:44,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=387380.0, ans=0.2 2024-08-10 05:15:48,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=387480.0, ans=0.2 2024-08-10 05:15:48,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=387480.0, ans=0.05 2024-08-10 05:16:00,599 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 05:16:13,387 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2024-08-10 05:16:24,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=387680.0, ans=0.2 2024-08-10 05:16:24,906 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 05:16:29,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=387780.0, ans=0.125 2024-08-10 05:16:43,171 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9800, loss[loss=0.1154, beats_loss=0.01296, ecapa_loss=0.0002308, whisper_loss=0.1001, over 22630.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01232, ecapa_loss=0.0002906, whisper_loss=0.09787, over 3880279.97 frames. ], batch size: 84, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:16:49,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=387880.0, ans=0.2 2024-08-10 05:16:51,611 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 05:16:59,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.835e+01 3.207e+01 3.802e+01 6.736e+01, threshold=6.414e+01, percent-clipped=1.0 2024-08-10 05:17:01,500 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 05:17:05,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=387980.0, ans=0.2 2024-08-10 05:17:20,510 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 05:17:27,019 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=8.0 2024-08-10 05:17:30,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=388180.0, ans=0.125 2024-08-10 05:17:34,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=388180.0, ans=0.2 2024-08-10 05:17:42,675 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 05:17:45,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=388280.0, ans=0.125 2024-08-10 05:17:49,969 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2024-08-10 05:17:51,962 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9850, loss[loss=0.1218, beats_loss=0.01104, ecapa_loss=0.0003172, whisper_loss=0.1076, over 16042.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01224, ecapa_loss=0.0002915, whisper_loss=0.09895, over 3882659.59 frames. ], batch size: 65, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:18:45,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=388780.0, ans=0.125 2024-08-10 05:18:50,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=388780.0, ans=0.125 2024-08-10 05:18:54,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388780.0, ans=0.1 2024-08-10 05:18:55,381 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 05:18:59,718 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-10 05:19:00,754 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9900, loss[loss=0.1184, beats_loss=0.01429, ecapa_loss=0.0002263, whisper_loss=0.1018, over 14977.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01237, ecapa_loss=0.0002889, whisper_loss=0.09813, over 3863600.60 frames. ], batch size: 56, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:19:08,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=388880.0, ans=0.1 2024-08-10 05:19:16,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=388980.0, ans=0.125 2024-08-10 05:19:17,355 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.911e+01 3.357e+01 3.805e+01 2.149e+02, threshold=6.715e+01, percent-clipped=2.0 2024-08-10 05:19:29,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=12.0 2024-08-10 05:19:32,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389080.0, ans=0.1 2024-08-10 05:19:40,463 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.76 vs. limit=15.0 2024-08-10 05:19:42,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=389180.0, ans=0.2 2024-08-10 05:19:55,080 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 05:20:10,179 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 9950, loss[loss=0.126, beats_loss=0.01067, ecapa_loss=0.0003346, whisper_loss=0.112, over 21354.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01227, ecapa_loss=0.0002917, whisper_loss=0.09889, over 3854690.80 frames. ], batch size: 84, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:20:10,446 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 05:20:14,080 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 05:20:15,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=389380.0, ans=0.0 2024-08-10 05:20:17,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=389380.0, ans=0.125 2024-08-10 05:20:21,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=389380.0, ans=0.125 2024-08-10 05:20:32,054 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 30 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 05:20:37,508 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.656e+00 2024-08-10 05:20:46,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=389580.0, ans=0.1 2024-08-10 05:20:49,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=389680.0, ans=0.0 2024-08-10 05:20:56,555 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.951e-01 2024-08-10 05:21:17,618 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10000, loss[loss=0.08436, beats_loss=0.01374, ecapa_loss=0.0003011, whisper_loss=0.06762, over 13152.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01219, ecapa_loss=0.0002947, whisper_loss=0.09948, over 3815306.74 frames. ], batch size: 57, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:21:34,861 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 3.053e+01 3.527e+01 4.199e+01 1.415e+02, threshold=7.054e+01, percent-clipped=3.0 2024-08-10 05:21:35,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=389980.0, ans=0.125 2024-08-10 05:21:36,517 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-10 05:21:40,249 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.46 vs. limit=5.0 2024-08-10 05:21:55,405 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.33 vs. limit=15.0 2024-08-10 05:21:59,973 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 05:22:27,117 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10050, loss[loss=0.1345, beats_loss=0.01102, ecapa_loss=0.0002843, whisper_loss=0.1207, over 22490.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01217, ecapa_loss=0.0002942, whisper_loss=0.09981, over 3822187.32 frames. ], batch size: 86, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:22:29,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=390380.0, ans=0.125 2024-08-10 05:22:30,872 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-10 05:22:38,885 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=12.0 2024-08-10 05:22:39,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=390480.0, ans=0.125 2024-08-10 05:22:41,517 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.13 vs. limit=15.0 2024-08-10 05:22:45,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=390480.0, ans=0.0 2024-08-10 05:22:48,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.93 vs. limit=15.0 2024-08-10 05:22:49,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2024-08-10 05:22:59,076 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 38 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 05:23:05,597 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 10 from Vox, 37 fro AS 2024-08-10 05:23:12,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=390680.0, ans=0.5 2024-08-10 05:23:34,593 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.68 vs. limit=15.0 2024-08-10 05:23:34,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=390880.0, ans=22.5 2024-08-10 05:23:35,215 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10100, loss[loss=0.1173, beats_loss=0.01136, ecapa_loss=0.0002671, whisper_loss=0.1033, over 19118.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01218, ecapa_loss=0.0002936, whisper_loss=0.1, over 3864808.68 frames. ], batch size: 74, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:23:41,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=390880.0, ans=0.1 2024-08-10 05:23:51,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.903e+01 3.270e+01 3.742e+01 9.283e+01, threshold=6.541e+01, percent-clipped=1.0 2024-08-10 05:23:54,271 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 05:23:58,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=390980.0, ans=0.07 2024-08-10 05:24:17,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=391180.0, ans=0.0 2024-08-10 05:24:40,349 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-10 05:24:44,534 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10150, loss[loss=0.1074, beats_loss=0.01135, ecapa_loss=0.0003138, whisper_loss=0.09292, over 20912.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01217, ecapa_loss=0.0002941, whisper_loss=0.09979, over 3868567.26 frames. ], batch size: 85, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:24:53,680 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 05:25:02,611 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.04 vs. limit=12.0 2024-08-10 05:25:16,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=391580.0, ans=0.0 2024-08-10 05:25:20,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=391580.0, ans=0.125 2024-08-10 05:25:23,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=391580.0, ans=0.125 2024-08-10 05:25:43,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-10 05:25:44,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=391780.0, ans=0.125 2024-08-10 05:25:44,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=391780.0, ans=0.125 2024-08-10 05:25:44,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=391780.0, ans=0.1 2024-08-10 05:25:55,924 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 05:25:56,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=391880.0, ans=0.125 2024-08-10 05:25:57,016 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10200, loss[loss=0.1214, beats_loss=0.01276, ecapa_loss=0.0002686, whisper_loss=0.1059, over 20532.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01217, ecapa_loss=0.000293, whisper_loss=0.1002, over 3874183.09 frames. ], batch size: 82, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:26:14,423 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 2.916e+01 3.286e+01 3.891e+01 7.167e+01, threshold=6.572e+01, percent-clipped=1.0 2024-08-10 05:26:46,608 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 05:26:50,519 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-10 05:26:55,502 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 05:26:55,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=392280.0, ans=0.1 2024-08-10 05:26:56,920 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-10 05:27:00,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=392280.0, ans=0.0 2024-08-10 05:27:06,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=392280.0, ans=0.5 2024-08-10 05:27:12,097 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10250, loss[loss=0.1134, beats_loss=0.01524, ecapa_loss=0.0002653, whisper_loss=0.09546, over 19417.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01216, ecapa_loss=0.0002911, whisper_loss=0.09989, over 3885498.34 frames. ], batch size: 78, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:27:21,593 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 05:27:40,899 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2024-08-10 05:27:45,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=392580.0, ans=0.0 2024-08-10 05:27:49,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=392580.0, ans=0.035 2024-08-10 05:27:59,239 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 05:28:03,584 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 05:28:11,739 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 05:28:15,412 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2024-08-10 05:28:19,102 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 05:28:19,432 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=15.0 2024-08-10 05:28:27,513 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10300, loss[loss=0.1155, beats_loss=0.01212, ecapa_loss=0.0003202, whisper_loss=0.1002, over 22238.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01214, ecapa_loss=0.0002914, whisper_loss=0.1004, over 3904267.50 frames. ], batch size: 91, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:28:35,788 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-10 05:28:46,140 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 3.063e+01 3.413e+01 3.835e+01 1.358e+02, threshold=6.826e+01, percent-clipped=1.0 2024-08-10 05:28:50,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-10 05:28:51,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=392980.0, ans=0.125 2024-08-10 05:28:54,055 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 05:29:06,942 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.48 vs. limit=15.0 2024-08-10 05:29:08,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=393080.0, ans=0.1 2024-08-10 05:29:25,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=393180.0, ans=0.125 2024-08-10 05:29:34,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=393280.0, ans=0.125 2024-08-10 05:29:34,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=393280.0, ans=0.1 2024-08-10 05:29:45,910 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10350, loss[loss=0.1284, beats_loss=0.01139, ecapa_loss=0.000222, whisper_loss=0.1147, over 17304.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01231, ecapa_loss=0.0002901, whisper_loss=0.09914, over 3908799.01 frames. ], batch size: 63, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:29:56,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=393380.0, ans=0.125 2024-08-10 05:30:01,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=393480.0, ans=0.125 2024-08-10 05:30:16,762 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 05:30:37,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=393680.0, ans=0.125 2024-08-10 05:30:43,450 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 05:30:52,710 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 26 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-10 05:30:59,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=393780.0, ans=0.0 2024-08-10 05:31:03,134 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10400, loss[loss=0.1112, beats_loss=0.01254, ecapa_loss=0.0002973, whisper_loss=0.09564, over 23015.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01238, ecapa_loss=0.0002891, whisper_loss=0.09841, over 3902091.51 frames. ], batch size: 93, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:31:07,715 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 05:31:13,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=393880.0, ans=0.2 2024-08-10 05:31:14,072 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-10 05:31:21,075 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.940e+01 3.359e+01 3.808e+01 2.361e+02, threshold=6.718e+01, percent-clipped=2.0 2024-08-10 05:31:36,916 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 05:31:37,423 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.08 vs. limit=22.5 2024-08-10 05:31:52,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=394180.0, ans=0.0 2024-08-10 05:31:55,793 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-10 05:32:03,227 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 05:32:08,692 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 05:32:16,072 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10450, loss[loss=0.08699, beats_loss=0.01637, ecapa_loss=0.0002615, whisper_loss=0.06801, over 15968.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01236, ecapa_loss=0.0002871, whisper_loss=0.09848, over 3882533.67 frames. ], batch size: 65, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:32:23,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=394380.0, ans=0.125 2024-08-10 05:32:33,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=394480.0, ans=0.0 2024-08-10 05:32:38,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=394480.0, ans=0.0 2024-08-10 05:32:54,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=394580.0, ans=0.04949747468305833 2024-08-10 05:32:58,827 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 05:33:10,506 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-10 05:33:20,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=394780.0, ans=0.1 2024-08-10 05:33:21,928 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2024-08-10 05:33:30,088 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 6 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 05:33:31,286 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10500, loss[loss=0.05107, beats_loss=0.01466, ecapa_loss=0.0002818, whisper_loss=0.03359, over 13660.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01227, ecapa_loss=0.000289, whisper_loss=0.09861, over 3872751.96 frames. ], batch size: 57, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:33:34,130 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 05:33:38,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=394880.0, ans=0.0 2024-08-10 05:33:40,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=394880.0, ans=0.07 2024-08-10 05:33:45,126 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 05:33:49,540 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.533e+01 2.971e+01 3.381e+01 3.721e+01 5.999e+01, threshold=6.761e+01, percent-clipped=0.0 2024-08-10 05:33:49,817 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 05:34:04,851 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 11 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-10 05:34:17,887 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 05:34:19,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395180.0, ans=0.1 2024-08-10 05:34:35,646 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 20 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 05:34:46,767 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10550, loss[loss=0.1209, beats_loss=0.0124, ecapa_loss=0.0002626, whisper_loss=0.1059, over 23003.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01219, ecapa_loss=0.0002909, whisper_loss=0.09966, over 3888963.49 frames. ], batch size: 92, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:34:53,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2024-08-10 05:35:34,463 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-08-10 05:35:41,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=395680.0, ans=0.125 2024-08-10 05:35:52,916 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.89 vs. limit=12.0 2024-08-10 05:36:01,394 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.459e+01 2024-08-10 05:36:02,145 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10600, loss[loss=0.1194, beats_loss=0.009387, ecapa_loss=0.0003088, whisper_loss=0.1069, over 18474.00 frames. ], tot_loss[loss=0.115, beats_loss=0.0122, ecapa_loss=0.0002906, whisper_loss=0.09986, over 3892839.41 frames. ], batch size: 71, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:36:16,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=395980.0, ans=0.125 2024-08-10 05:36:16,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395980.0, ans=0.1 2024-08-10 05:36:19,673 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.978e+01 3.470e+01 3.932e+01 9.831e+01, threshold=6.940e+01, percent-clipped=1.0 2024-08-10 05:36:25,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=395980.0, ans=0.05 2024-08-10 05:36:26,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=395980.0, ans=0.125 2024-08-10 05:36:26,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=395980.0, ans=0.0 2024-08-10 05:36:41,835 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.322e-01 2024-08-10 05:36:51,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=396180.0, ans=0.0 2024-08-10 05:36:59,000 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 05:37:10,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=396280.0, ans=0.0 2024-08-10 05:37:17,804 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10650, loss[loss=0.1089, beats_loss=0.01358, ecapa_loss=0.000238, whisper_loss=0.09299, over 22776.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.0122, ecapa_loss=0.0002911, whisper_loss=0.09945, over 3861453.09 frames. ], batch size: 89, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:37:20,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=396380.0, ans=0.05 2024-08-10 05:37:27,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=396380.0, ans=0.125 2024-08-10 05:37:55,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=396580.0, ans=0.2 2024-08-10 05:37:57,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=396580.0, ans=0.125 2024-08-10 05:38:00,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=396580.0, ans=0.125 2024-08-10 05:38:04,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=396680.0, ans=0.125 2024-08-10 05:38:13,631 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 05:38:26,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=396780.0, ans=0.125 2024-08-10 05:38:32,092 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10700, loss[loss=0.1277, beats_loss=0.01235, ecapa_loss=0.0002715, whisper_loss=0.1127, over 17228.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01223, ecapa_loss=0.0002899, whisper_loss=0.09979, over 3868604.82 frames. ], batch size: 69, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:38:38,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=396880.0, ans=0.2 2024-08-10 05:38:47,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=396980.0, ans=0.125 2024-08-10 05:38:49,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.532e+01 3.168e+01 3.517e+01 4.154e+01 8.442e+01, threshold=7.034e+01, percent-clipped=1.0 2024-08-10 05:39:07,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=397080.0, ans=0.0 2024-08-10 05:39:12,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=397080.0, ans=0.2 2024-08-10 05:39:18,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=397180.0, ans=0.125 2024-08-10 05:39:21,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=397180.0, ans=0.2 2024-08-10 05:39:23,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=397180.0, ans=0.2 2024-08-10 05:39:26,705 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-10 05:39:32,535 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 05:39:32,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=397280.0, ans=0.125 2024-08-10 05:39:38,756 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-10 05:39:40,114 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 05:39:44,315 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.823e+00 2024-08-10 05:39:47,375 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-08-10 05:39:47,666 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10750, loss[loss=0.1124, beats_loss=0.01343, ecapa_loss=0.0002845, whisper_loss=0.09613, over 18333.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01225, ecapa_loss=0.0002888, whisper_loss=0.09998, over 3850139.49 frames. ], batch size: 72, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:39:48,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=397380.0, ans=0.125 2024-08-10 05:39:49,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=397380.0, ans=0.0 2024-08-10 05:40:11,469 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-10 05:40:17,751 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.30 vs. limit=15.0 2024-08-10 05:40:31,478 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 26 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-10 05:40:32,870 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 05:41:02,535 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10800, loss[loss=0.1233, beats_loss=0.01297, ecapa_loss=0.000324, whisper_loss=0.1071, over 21952.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01224, ecapa_loss=0.0002896, whisper_loss=0.09932, over 3870513.61 frames. ], batch size: 89, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:41:03,274 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-10 05:41:06,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=397880.0, ans=0.0 2024-08-10 05:41:08,897 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 05:41:19,013 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-10 05:41:20,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.457e+01 2.911e+01 3.259e+01 3.950e+01 6.115e+01, threshold=6.518e+01, percent-clipped=0.0 2024-08-10 05:41:32,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=398080.0, ans=0.025 2024-08-10 05:41:48,075 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2024-08-10 05:42:18,462 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10850, loss[loss=0.1123, beats_loss=0.01244, ecapa_loss=0.0003341, whisper_loss=0.09648, over 21671.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01227, ecapa_loss=0.0002899, whisper_loss=0.09876, over 3862494.79 frames. ], batch size: 92, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:42:35,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=398480.0, ans=0.0 2024-08-10 05:42:41,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=398480.0, ans=0.0 2024-08-10 05:42:59,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=398580.0, ans=0.2 2024-08-10 05:43:20,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=398780.0, ans=0.07 2024-08-10 05:43:35,513 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10900, loss[loss=0.1071, beats_loss=0.01273, ecapa_loss=0.0002527, whisper_loss=0.09185, over 18538.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01218, ecapa_loss=0.0002916, whisper_loss=0.1001, over 3921843.53 frames. ], batch size: 75, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:43:41,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=398880.0, ans=0.125 2024-08-10 05:43:49,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=398980.0, ans=0.125 2024-08-10 05:43:53,648 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 3.145e+01 3.517e+01 3.996e+01 1.577e+02, threshold=7.034e+01, percent-clipped=2.0 2024-08-10 05:43:59,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=398980.0, ans=0.1 2024-08-10 05:44:01,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=398980.0, ans=0.125 2024-08-10 05:44:17,110 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 05:44:49,855 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 10950, loss[loss=0.1179, beats_loss=0.01325, ecapa_loss=0.0002507, whisper_loss=0.1022, over 21915.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01223, ecapa_loss=0.0002892, whisper_loss=0.09959, over 3903634.31 frames. ], batch size: 86, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:44:50,159 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 05:45:01,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=399380.0, ans=0.125 2024-08-10 05:45:02,274 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 05:45:02,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=399380.0, ans=0.125 2024-08-10 05:45:11,186 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 05:45:11,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=399480.0, ans=0.125 2024-08-10 05:45:11,571 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2024-08-10 05:45:19,779 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 05:45:30,101 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.74 vs. limit=15.0 2024-08-10 05:45:45,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=399680.0, ans=0.125 2024-08-10 05:45:46,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=399680.0, ans=0.0 2024-08-10 05:46:04,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399880.0, ans=0.1 2024-08-10 05:46:05,973 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11000, loss[loss=0.09549, beats_loss=0.0146, ecapa_loss=0.0002533, whisper_loss=0.07835, over 17529.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01221, ecapa_loss=0.00029, whisper_loss=0.09903, over 3923271.80 frames. ], batch size: 68, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:46:12,735 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2024-08-10 05:46:26,642 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.898e+01 3.404e+01 3.976e+01 6.521e+01, threshold=6.808e+01, percent-clipped=0.0 2024-08-10 05:46:28,255 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-10 05:46:29,667 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 05:46:38,878 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2024-08-10 05:46:52,090 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-10 05:47:10,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=400280.0, ans=0.0 2024-08-10 05:47:17,793 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 16 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 05:47:21,966 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11050, loss[loss=0.11, beats_loss=0.01246, ecapa_loss=0.0002893, whisper_loss=0.09468, over 20890.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01212, ecapa_loss=0.0002897, whisper_loss=0.09892, over 3919438.08 frames. ], batch size: 87, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:47:22,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=400380.0, ans=0.0 2024-08-10 05:47:41,423 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 05:48:33,218 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 05:48:34,872 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11100, loss[loss=0.124, beats_loss=0.01044, ecapa_loss=0.0003407, whisper_loss=0.1102, over 17055.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01218, ecapa_loss=0.0002881, whisper_loss=0.099, over 3908809.39 frames. ], batch size: 71, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:48:42,642 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-10 05:48:52,841 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.998e+01 3.322e+01 3.680e+01 7.626e+01, threshold=6.644e+01, percent-clipped=1.0 2024-08-10 05:48:55,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=400980.0, ans=0.125 2024-08-10 05:49:13,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=401080.0, ans=0.1 2024-08-10 05:49:32,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=401180.0, ans=0.125 2024-08-10 05:49:41,360 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 05:49:50,137 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11150, loss[loss=0.1133, beats_loss=0.01058, ecapa_loss=0.0003218, whisper_loss=0.09953, over 19248.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01214, ecapa_loss=0.0002886, whisper_loss=0.09916, over 3902983.30 frames. ], batch size: 79, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:49:55,366 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 12 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 05:50:04,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=401480.0, ans=0.0 2024-08-10 05:50:15,385 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 05:50:23,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.37 vs. limit=15.0 2024-08-10 05:50:40,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=401680.0, ans=0.0 2024-08-10 05:50:45,769 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-10 05:51:01,884 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11200, loss[loss=0.1114, beats_loss=0.01201, ecapa_loss=0.0002652, whisper_loss=0.09674, over 22871.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.0121, ecapa_loss=0.0002895, whisper_loss=0.09913, over 3880871.67 frames. ], batch size: 91, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:51:08,336 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2024-08-10 05:51:11,871 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 35 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 05:51:18,843 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 3.098e+01 3.521e+01 4.109e+01 7.831e+01, threshold=7.041e+01, percent-clipped=1.0 2024-08-10 05:51:21,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=401980.0, ans=0.125 2024-08-10 05:51:35,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-10 05:51:41,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402080.0, ans=0.1 2024-08-10 05:51:43,982 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 28 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 05:51:45,424 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-10 05:51:50,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=402180.0, ans=0.0 2024-08-10 05:51:50,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=402180.0, ans=0.1 2024-08-10 05:51:52,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=402180.0, ans=0.2 2024-08-10 05:51:58,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=402180.0, ans=0.125 2024-08-10 05:51:59,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=402280.0, ans=0.1 2024-08-10 05:52:13,816 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 05:52:16,321 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11250, loss[loss=0.1145, beats_loss=0.01308, ecapa_loss=0.0002812, whisper_loss=0.09864, over 22349.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.0121, ecapa_loss=0.000291, whisper_loss=0.09916, over 3884745.16 frames. ], batch size: 93, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:52:19,849 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.81 vs. limit=22.5 2024-08-10 05:52:34,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=402480.0, ans=0.05 2024-08-10 05:52:55,994 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.11 vs. limit=12.0 2024-08-10 05:53:00,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=402680.0, ans=0.125 2024-08-10 05:53:02,355 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 05:53:06,814 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-10 05:53:25,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402780.0, ans=0.1 2024-08-10 05:53:27,607 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11300, loss[loss=0.1257, beats_loss=0.01116, ecapa_loss=0.0002593, whisper_loss=0.1119, over 21962.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01212, ecapa_loss=0.000289, whisper_loss=0.09921, over 3887332.33 frames. ], batch size: 87, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:53:40,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=402980.0, ans=0.125 2024-08-10 05:53:40,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=402980.0, ans=0.125 2024-08-10 05:53:44,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.975e+01 3.482e+01 3.976e+01 1.269e+02, threshold=6.963e+01, percent-clipped=1.0 2024-08-10 05:54:05,976 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 05:54:06,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=403080.0, ans=0.0 2024-08-10 05:54:06,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=403080.0, ans=0.125 2024-08-10 05:54:12,033 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2024-08-10 05:54:39,538 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11350, loss[loss=0.1283, beats_loss=0.01333, ecapa_loss=0.0002442, whisper_loss=0.1125, over 21562.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01205, ecapa_loss=0.0002884, whisper_loss=0.09954, over 3900261.25 frames. ], batch size: 84, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:54:57,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=403480.0, ans=0.125 2024-08-10 05:55:14,567 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 05:55:22,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=403580.0, ans=0.125 2024-08-10 05:55:27,795 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.02 vs. limit=15.0 2024-08-10 05:55:31,704 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 05:55:34,358 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-10 05:55:39,188 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 05:55:41,497 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=22.5 2024-08-10 05:55:55,410 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11400, loss[loss=0.1171, beats_loss=0.01462, ecapa_loss=0.0002577, whisper_loss=0.09992, over 23003.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01214, ecapa_loss=0.0002899, whisper_loss=0.09887, over 3902090.13 frames. ], batch size: 93, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:55:58,696 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 05:55:58,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=403880.0, ans=0.025 2024-08-10 05:56:03,327 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2024-08-10 05:56:13,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.448e+01 3.091e+01 3.465e+01 3.981e+01 8.996e+01, threshold=6.931e+01, percent-clipped=1.0 2024-08-10 05:56:16,280 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 14 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-10 05:56:16,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=403980.0, ans=0.1 2024-08-10 05:56:39,986 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 05:57:03,556 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-10 05:57:03,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=404280.0, ans=0.125 2024-08-10 05:57:08,981 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11450, loss[loss=0.1134, beats_loss=0.01098, ecapa_loss=0.0003299, whisper_loss=0.09911, over 18827.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01224, ecapa_loss=0.0002897, whisper_loss=0.09863, over 3872852.30 frames. ], batch size: 78, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:57:15,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=404380.0, ans=0.2 2024-08-10 05:57:18,426 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 05:57:24,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=404480.0, ans=0.1 2024-08-10 05:57:51,649 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 05:57:52,295 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=12.0 2024-08-10 05:57:53,435 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 05:57:54,953 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 05:57:59,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=404680.0, ans=0.0 2024-08-10 05:58:03,013 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.77 vs. limit=10.0 2024-08-10 05:58:08,843 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 35 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 05:58:09,581 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=22.5 2024-08-10 05:58:14,541 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 05:58:24,782 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11500, loss[loss=0.07757, beats_loss=0.0151, ecapa_loss=0.0002775, whisper_loss=0.0597, over 17109.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01224, ecapa_loss=0.0002909, whisper_loss=0.09882, over 3868766.29 frames. ], batch size: 71, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:58:28,124 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 05:58:29,501 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 05:58:42,814 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.645e+01 3.195e+01 3.620e+01 4.078e+01 2.789e+02, threshold=7.241e+01, percent-clipped=1.0 2024-08-10 05:58:44,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=404980.0, ans=0.0 2024-08-10 05:58:48,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=404980.0, ans=0.0 2024-08-10 05:59:05,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=405080.0, ans=0.125 2024-08-10 05:59:25,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=405280.0, ans=0.1 2024-08-10 05:59:30,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=405280.0, ans=0.125 2024-08-10 05:59:31,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=405280.0, ans=0.1 2024-08-10 05:59:38,210 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11550, loss[loss=0.117, beats_loss=0.0136, ecapa_loss=0.0002707, whisper_loss=0.1007, over 23872.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01225, ecapa_loss=0.0002902, whisper_loss=0.09854, over 3845508.68 frames. ], batch size: 92, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:59:48,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=405380.0, ans=0.025 2024-08-10 06:00:01,770 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.705e+01 2024-08-10 06:00:14,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405580.0, ans=0.1 2024-08-10 06:00:19,678 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 06:00:43,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=405780.0, ans=0.05 2024-08-10 06:00:54,260 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11600, loss[loss=0.1115, beats_loss=0.01039, ecapa_loss=0.0002933, whisper_loss=0.09816, over 16332.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01228, ecapa_loss=0.0002926, whisper_loss=0.09904, over 3901050.22 frames. ], batch size: 63, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 06:00:56,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=405880.0, ans=15.0 2024-08-10 06:00:57,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=405880.0, ans=0.0 2024-08-10 06:00:57,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=405880.0, ans=0.2 2024-08-10 06:00:58,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=405880.0, ans=0.2 2024-08-10 06:01:08,775 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-10 06:01:11,931 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.511e+01 3.361e+01 3.673e+01 4.425e+01 6.331e+01, threshold=7.346e+01, percent-clipped=0.0 2024-08-10 06:01:12,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=405980.0, ans=0.125 2024-08-10 06:01:22,517 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 06:01:41,283 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 06:01:49,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406180.0, ans=0.1 2024-08-10 06:01:54,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=406280.0, ans=0.0 2024-08-10 06:02:06,716 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11650, loss[loss=0.093, beats_loss=0.01316, ecapa_loss=0.0002834, whisper_loss=0.077, over 19191.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01221, ecapa_loss=0.0002929, whisper_loss=0.09968, over 3926192.40 frames. ], batch size: 78, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 06:02:29,132 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-10 06:02:36,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406580.0, ans=0.1 2024-08-10 06:02:43,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=406580.0, ans=0.1 2024-08-10 06:02:46,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=406580.0, ans=0.0 2024-08-10 06:02:49,198 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 06:02:49,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=406680.0, ans=0.125 2024-08-10 06:03:12,871 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-10 06:03:16,725 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11700, loss[loss=0.1067, beats_loss=0.01259, ecapa_loss=0.0002957, whisper_loss=0.09111, over 18080.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01228, ecapa_loss=0.0002936, whisper_loss=0.09853, over 3929160.47 frames. ], batch size: 72, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 06:03:17,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=406880.0, ans=0.0 2024-08-10 06:03:25,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=406880.0, ans=0.025 2024-08-10 06:03:33,630 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.451e+01 3.237e+01 3.576e+01 4.266e+01 6.520e+01, threshold=7.151e+01, percent-clipped=0.0 2024-08-10 06:03:35,336 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 06:03:38,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=406980.0, ans=0.09899494936611666 2024-08-10 06:03:40,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=406980.0, ans=0.125 2024-08-10 06:03:47,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=407080.0, ans=0.125 2024-08-10 06:03:48,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=407080.0, ans=0.025 2024-08-10 06:03:53,242 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 06:04:11,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.50 vs. limit=10.0 2024-08-10 06:04:28,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=407380.0, ans=0.125 2024-08-10 06:04:28,999 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11750, loss[loss=0.1193, beats_loss=0.01244, ecapa_loss=0.0002734, whisper_loss=0.1042, over 21703.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.0122, ecapa_loss=0.0002943, whisper_loss=0.09933, over 3912898.55 frames. ], batch size: 88, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:04:33,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=407380.0, ans=0.1 2024-08-10 06:04:43,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=407480.0, ans=0.5 2024-08-10 06:05:40,181 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 06:05:40,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=407780.0, ans=0.125 2024-08-10 06:05:43,090 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11800, loss[loss=0.1051, beats_loss=0.01233, ecapa_loss=0.0002932, whisper_loss=0.08982, over 19731.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01229, ecapa_loss=0.000292, whisper_loss=0.09917, over 3928403.51 frames. ], batch size: 81, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:05:47,054 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 29 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 06:05:51,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=407880.0, ans=0.0 2024-08-10 06:05:58,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=407980.0, ans=0.1 2024-08-10 06:05:59,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.497e+01 3.074e+01 3.455e+01 3.897e+01 7.543e+01, threshold=6.910e+01, percent-clipped=1.0 2024-08-10 06:06:08,097 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2024-08-10 06:06:13,539 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 10 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 06:06:17,722 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 06:06:23,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=408180.0, ans=0.125 2024-08-10 06:06:24,027 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2024-08-10 06:06:26,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=408180.0, ans=0.2 2024-08-10 06:06:31,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=408180.0, ans=15.0 2024-08-10 06:06:50,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=408280.0, ans=0.125 2024-08-10 06:06:53,654 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11850, loss[loss=0.1224, beats_loss=0.0124, ecapa_loss=0.0002193, whisper_loss=0.1078, over 21195.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01231, ecapa_loss=0.0002904, whisper_loss=0.09846, over 3910005.57 frames. ], batch size: 81, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:07:05,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=408380.0, ans=0.0 2024-08-10 06:07:10,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408480.0, ans=0.1 2024-08-10 06:07:19,719 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-10 06:07:21,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=408580.0, ans=0.025 2024-08-10 06:07:24,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=408580.0, ans=0.0 2024-08-10 06:07:37,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=408680.0, ans=0.125 2024-08-10 06:07:37,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408680.0, ans=0.1 2024-08-10 06:07:44,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=408680.0, ans=0.125 2024-08-10 06:08:03,987 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11900, loss[loss=0.1184, beats_loss=0.01421, ecapa_loss=0.0002374, whisper_loss=0.1018, over 17662.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01229, ecapa_loss=0.0002882, whisper_loss=0.09895, over 3905719.30 frames. ], batch size: 70, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:08:09,593 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-10 06:08:14,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=408880.0, ans=0.1 2024-08-10 06:08:16,934 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-08-10 06:08:16,979 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.40 vs. limit=22.5 2024-08-10 06:08:20,519 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.628e+01 3.266e+01 3.553e+01 4.247e+01 1.215e+02, threshold=7.106e+01, percent-clipped=1.0 2024-08-10 06:08:22,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=408980.0, ans=0.125 2024-08-10 06:08:27,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=408980.0, ans=0.035 2024-08-10 06:08:38,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=409080.0, ans=0.125 2024-08-10 06:09:06,578 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 06:09:13,145 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 11950, loss[loss=0.1385, beats_loss=0.008967, ecapa_loss=0.0002453, whisper_loss=0.1271, over 17494.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01218, ecapa_loss=0.0002893, whisper_loss=0.09913, over 3896690.67 frames. ], batch size: 64, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:09:33,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=409480.0, ans=0.035 2024-08-10 06:09:40,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=409580.0, ans=0.1 2024-08-10 06:09:55,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=409680.0, ans=0.0 2024-08-10 06:09:55,869 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.06 vs. limit=22.5 2024-08-10 06:10:12,896 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2024-08-10 06:10:14,991 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 06:10:17,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=409780.0, ans=0.125 2024-08-10 06:10:22,732 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12000, loss[loss=0.1143, beats_loss=0.01235, ecapa_loss=0.0002847, whisper_loss=0.09908, over 22920.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01226, ecapa_loss=0.0002876, whisper_loss=0.09843, over 3878416.06 frames. ], batch size: 90, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:10:22,733 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 06:11:01,960 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on ASR_libri: loss=0.2695, beats_loss=0, ecapa_loss=0.000863, whisper_loss=0.2608, over 922467.00 frames. 2024-08-10 06:11:17,651 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on SV_voxceleb1: loss=0.007635, beats_loss=0, ecapa_loss=0.0007635, whisper_loss=0, over 939242.00 frames. 2024-08-10 06:13:11,076 INFO [train_multi_KD3.py:1149] (1/4) Epoch 3, validation on AT_audioset: loss=0.0284, beats_loss=0.0284, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 06:13:11,080 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 06:13:23,170 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 06:13:28,233 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.155e+01 3.494e+01 4.116e+01 7.765e+01, threshold=6.989e+01, percent-clipped=1.0 2024-08-10 06:13:30,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=409980.0, ans=0.125 2024-08-10 06:13:41,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=410080.0, ans=0.125 2024-08-10 06:13:52,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=410080.0, ans=0.0 2024-08-10 06:13:52,646 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.759e+00 2024-08-10 06:14:21,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=410280.0, ans=0.125 2024-08-10 06:14:23,363 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12050, loss[loss=0.1028, beats_loss=0.01453, ecapa_loss=0.0002732, whisper_loss=0.08556, over 20027.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01215, ecapa_loss=0.0002886, whisper_loss=0.09913, over 3852649.52 frames. ], batch size: 80, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:14:25,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410380.0, ans=0.1 2024-08-10 06:14:29,213 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 27 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 06:14:34,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=410380.0, ans=0.125 2024-08-10 06:14:37,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=410480.0, ans=0.125 2024-08-10 06:14:37,468 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.40 vs. limit=15.0 2024-08-10 06:14:52,860 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 06:14:57,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=410580.0, ans=0.0 2024-08-10 06:15:00,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2024-08-10 06:15:11,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410680.0, ans=0.1 2024-08-10 06:15:13,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=410680.0, ans=0.125 2024-08-10 06:15:22,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=410780.0, ans=0.125 2024-08-10 06:15:33,090 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12100, loss[loss=0.1058, beats_loss=0.01237, ecapa_loss=0.0003258, whisper_loss=0.09014, over 17188.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01222, ecapa_loss=0.0002893, whisper_loss=0.0992, over 3866108.86 frames. ], batch size: 71, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:15:37,749 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.777e+03 2024-08-10 06:15:40,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=410880.0, ans=0.125 2024-08-10 06:15:49,336 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.418e+01 3.160e+01 3.535e+01 4.240e+01 9.123e+01, threshold=7.071e+01, percent-clipped=3.0 2024-08-10 06:15:55,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410980.0, ans=0.1 2024-08-10 06:16:05,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411080.0, ans=0.1 2024-08-10 06:16:17,078 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 26 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 06:16:40,552 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 06:16:40,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=411380.0, ans=0.125 2024-08-10 06:16:41,661 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12150, loss[loss=0.1136, beats_loss=0.0131, ecapa_loss=0.0003219, whisper_loss=0.09732, over 16642.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01228, ecapa_loss=0.0002905, whisper_loss=0.0983, over 3841159.32 frames. ], batch size: 67, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:16:53,148 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-10 06:16:54,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411480.0, ans=0.1 2024-08-10 06:16:54,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=411480.0, ans=0.125 2024-08-10 06:17:03,248 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.40 vs. limit=22.5 2024-08-10 06:17:11,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=411580.0, ans=0.125 2024-08-10 06:17:23,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=411680.0, ans=0.0 2024-08-10 06:17:30,617 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 06:17:30,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=411680.0, ans=0.125 2024-08-10 06:17:34,466 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-10 06:17:50,795 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12200, loss[loss=0.1038, beats_loss=0.01026, ecapa_loss=0.0003013, whisper_loss=0.09054, over 16896.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01222, ecapa_loss=0.0002907, whisper_loss=0.09844, over 3832820.89 frames. ], batch size: 67, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:17:58,164 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 06:18:07,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.664e+01 3.192e+01 3.663e+01 4.187e+01 6.724e+01, threshold=7.326e+01, percent-clipped=0.0 2024-08-10 06:18:11,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=411980.0, ans=0.125 2024-08-10 06:18:20,382 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 06:18:22,655 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.24 vs. limit=10.0 2024-08-10 06:18:36,477 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.678e+00 2024-08-10 06:18:36,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=412180.0, ans=0.1 2024-08-10 06:18:43,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=412180.0, ans=0.125 2024-08-10 06:18:44,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=412180.0, ans=0.0 2024-08-10 06:18:50,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=412280.0, ans=0.07 2024-08-10 06:18:59,679 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-10 06:19:03,414 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12250, loss[loss=0.105, beats_loss=0.01048, ecapa_loss=0.0003346, whisper_loss=0.0912, over 21798.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01206, ecapa_loss=0.0002938, whisper_loss=0.09879, over 3824342.03 frames. ], batch size: 91, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:19:18,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=412480.0, ans=0.125 2024-08-10 06:19:20,167 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 06:19:35,466 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 06:20:03,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=412780.0, ans=0.125 2024-08-10 06:20:12,516 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12300, loss[loss=0.1044, beats_loss=0.01405, ecapa_loss=0.0002708, whisper_loss=0.08765, over 15930.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01219, ecapa_loss=0.000293, whisper_loss=0.09804, over 3836035.59 frames. ], batch size: 66, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:20:13,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=412880.0, ans=0.125 2024-08-10 06:20:18,012 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 06:20:28,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.503e+01 3.356e+01 3.807e+01 4.575e+01 1.219e+02, threshold=7.614e+01, percent-clipped=2.0 2024-08-10 06:20:41,791 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 27 from Vox, 14 fro AS 2024-08-10 06:20:43,159 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 13 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 06:20:45,020 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.93 vs. limit=22.5 2024-08-10 06:20:52,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=413180.0, ans=0.0 2024-08-10 06:20:58,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=413180.0, ans=0.1 2024-08-10 06:21:21,634 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12350, loss[loss=0.138, beats_loss=0.00762, ecapa_loss=0.0003253, whisper_loss=0.1271, over 18903.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01207, ecapa_loss=0.0002966, whisper_loss=0.09849, over 3842164.02 frames. ], batch size: 73, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:21:21,892 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-10 06:21:36,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=413480.0, ans=0.0 2024-08-10 06:21:41,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=413480.0, ans=0.125 2024-08-10 06:22:02,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.51 vs. limit=12.0 2024-08-10 06:22:08,833 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.82 vs. limit=5.0 2024-08-10 06:22:10,711 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 06:22:20,823 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 06:22:30,422 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12400, loss[loss=0.1021, beats_loss=0.01544, ecapa_loss=0.0002534, whisper_loss=0.08417, over 22169.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01209, ecapa_loss=0.0002936, whisper_loss=0.09907, over 3846554.97 frames. ], batch size: 88, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:22:38,003 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.42 vs. limit=12.0 2024-08-10 06:22:47,297 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 3.123e+01 3.503e+01 4.019e+01 1.294e+02, threshold=7.007e+01, percent-clipped=1.0 2024-08-10 06:23:19,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=414180.0, ans=0.1 2024-08-10 06:23:20,291 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 14 from Vox, 51 fro AS 2024-08-10 06:23:30,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=414280.0, ans=0.1 2024-08-10 06:23:32,613 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 06:23:35,373 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 16 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 06:23:39,472 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12450, loss[loss=0.08267, beats_loss=0.0142, ecapa_loss=0.000225, whisper_loss=0.06621, over 14247.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01216, ecapa_loss=0.0002911, whisper_loss=0.09865, over 3830905.60 frames. ], batch size: 54, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:24:03,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2024-08-10 06:24:08,416 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 06:24:11,106 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.51 vs. limit=10.0 2024-08-10 06:24:12,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=414580.0, ans=0.2 2024-08-10 06:24:16,946 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-08-10 06:24:17,898 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 06:24:20,765 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 06:24:30,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=414680.0, ans=0.125 2024-08-10 06:24:40,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=414780.0, ans=0.125 2024-08-10 06:24:41,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=414780.0, ans=0.2 2024-08-10 06:24:46,676 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 06:24:49,275 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12500, loss[loss=0.1073, beats_loss=0.0122, ecapa_loss=0.0002704, whisper_loss=0.09239, over 16385.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01221, ecapa_loss=0.0002909, whisper_loss=0.09858, over 3849432.34 frames. ], batch size: 62, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:24:53,771 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-10 06:24:57,702 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 14 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 06:25:05,499 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.775e-01 2024-08-10 06:25:06,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.398e+01 3.279e+01 3.697e+01 4.212e+01 5.815e+01, threshold=7.393e+01, percent-clipped=0.0 2024-08-10 06:25:14,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=414980.0, ans=0.125 2024-08-10 06:25:32,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=415180.0, ans=0.2 2024-08-10 06:25:32,950 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 06:26:01,447 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12550, loss[loss=0.1232, beats_loss=0.009975, ecapa_loss=0.0003066, whisper_loss=0.1101, over 22892.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01231, ecapa_loss=0.0002885, whisper_loss=0.0983, over 3864660.64 frames. ], batch size: 91, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:26:12,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=415380.0, ans=0.0 2024-08-10 06:26:21,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=415480.0, ans=0.125 2024-08-10 06:26:45,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=415680.0, ans=0.1 2024-08-10 06:26:48,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=415680.0, ans=0.125 2024-08-10 06:26:51,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=415680.0, ans=0.1 2024-08-10 06:26:51,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=415680.0, ans=0.0 2024-08-10 06:26:57,929 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 06:27:04,605 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 06:27:14,726 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12600, loss[loss=0.09822, beats_loss=0.01385, ecapa_loss=0.0002382, whisper_loss=0.08199, over 14708.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01233, ecapa_loss=0.0002905, whisper_loss=0.09818, over 3878775.64 frames. ], batch size: 56, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:27:32,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=415980.0, ans=0.1 2024-08-10 06:27:34,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=415980.0, ans=0.125 2024-08-10 06:27:35,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.630e+01 3.161e+01 3.514e+01 4.071e+01 6.890e+01, threshold=7.028e+01, percent-clipped=0.0 2024-08-10 06:27:43,466 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 06:27:48,898 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.290e+03 2024-08-10 06:27:48,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=416080.0, ans=0.125 2024-08-10 06:27:58,155 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 06:28:05,662 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2024-08-10 06:28:13,677 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-08-10 06:28:17,187 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 06:28:35,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=416280.0, ans=0.0 2024-08-10 06:28:38,352 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12650, loss[loss=0.08879, beats_loss=0.01719, ecapa_loss=0.0001994, whisper_loss=0.06961, over 15806.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01241, ecapa_loss=0.0002877, whisper_loss=0.09838, over 3896288.33 frames. ], batch size: 63, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:28:40,194 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 06:28:44,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=416380.0, ans=0.125 2024-08-10 06:28:47,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=416380.0, ans=0.2 2024-08-10 06:28:52,029 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-10 06:28:54,835 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 06:29:00,886 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 06:29:07,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=416580.0, ans=0.125 2024-08-10 06:29:14,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=416580.0, ans=0.04949747468305833 2024-08-10 06:29:20,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=416580.0, ans=0.125 2024-08-10 06:29:24,847 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 06:29:32,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=416680.0, ans=0.125 2024-08-10 06:29:38,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2024-08-10 06:30:00,157 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12700, loss[loss=0.1155, beats_loss=0.01472, ecapa_loss=0.0003032, whisper_loss=0.09778, over 17160.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01242, ecapa_loss=0.0002875, whisper_loss=0.0985, over 3880767.46 frames. ], batch size: 70, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:30:12,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=416880.0, ans=0.0 2024-08-10 06:30:23,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 3.052e+01 3.376e+01 3.987e+01 6.626e+01, threshold=6.752e+01, percent-clipped=0.0 2024-08-10 06:30:34,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416980.0, ans=0.1 2024-08-10 06:30:40,704 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-10 06:31:08,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=417180.0, ans=0.125 2024-08-10 06:31:29,332 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 06:31:31,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=417280.0, ans=0.0 2024-08-10 06:31:32,796 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 06:31:40,447 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12750, loss[loss=0.09154, beats_loss=0.01387, ecapa_loss=0.0002783, whisper_loss=0.07489, over 16007.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01245, ecapa_loss=0.0002874, whisper_loss=0.09794, over 3863081.72 frames. ], batch size: 65, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:32:04,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=417480.0, ans=0.125 2024-08-10 06:32:19,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=417580.0, ans=10.0 2024-08-10 06:32:30,640 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2024-08-10 06:32:38,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=417580.0, ans=0.125 2024-08-10 06:32:56,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=417680.0, ans=0.125 2024-08-10 06:33:12,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=417780.0, ans=0.125 2024-08-10 06:33:20,321 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12800, loss[loss=0.08939, beats_loss=0.01429, ecapa_loss=0.0002632, whisper_loss=0.07247, over 14118.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01259, ecapa_loss=0.000289, whisper_loss=0.0973, over 3859272.28 frames. ], batch size: 55, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:33:22,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=417880.0, ans=0.125 2024-08-10 06:33:42,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 3.114e+01 3.592e+01 4.168e+01 8.043e+01, threshold=7.184e+01, percent-clipped=1.0 2024-08-10 06:33:58,457 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 06:33:58,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=418080.0, ans=0.0 2024-08-10 06:34:10,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=418080.0, ans=0.2 2024-08-10 06:34:13,045 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.965e+00 2024-08-10 06:34:15,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=418080.0, ans=0.125 2024-08-10 06:34:33,496 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 24 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-10 06:34:37,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=418180.0, ans=0.1 2024-08-10 06:34:59,444 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12850, loss[loss=0.1302, beats_loss=0.01038, ecapa_loss=0.0002677, whisper_loss=0.1171, over 16323.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01258, ecapa_loss=0.0002906, whisper_loss=0.0967, over 3850884.16 frames. ], batch size: 62, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:35:04,186 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 27 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 06:35:35,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=418580.0, ans=0.5 2024-08-10 06:35:42,467 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 06:35:44,911 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 06:35:45,498 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-10 06:35:49,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=418680.0, ans=0.2 2024-08-10 06:36:09,820 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12900, loss[loss=0.1061, beats_loss=0.0131, ecapa_loss=0.0003365, whisper_loss=0.08966, over 21208.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01251, ecapa_loss=0.0002917, whisper_loss=0.09617, over 3816765.13 frames. ], batch size: 88, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:36:12,663 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 06:36:25,590 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=12.0 2024-08-10 06:36:26,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 3.152e+01 3.621e+01 4.177e+01 6.125e+01, threshold=7.242e+01, percent-clipped=0.0 2024-08-10 06:36:36,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=419080.0, ans=0.125 2024-08-10 06:36:54,366 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 06:36:54,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=419180.0, ans=0.125 2024-08-10 06:37:03,390 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 06:37:19,395 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 12950, loss[loss=0.0976, beats_loss=0.01191, ecapa_loss=0.0002447, whisper_loss=0.08325, over 17848.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01238, ecapa_loss=0.0002889, whisper_loss=0.09624, over 3838428.07 frames. ], batch size: 70, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:37:30,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=419380.0, ans=0.125 2024-08-10 06:37:36,420 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 06:37:43,297 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 06:37:50,787 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2024-08-10 06:37:52,892 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-10 06:37:57,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=419580.0, ans=15.0 2024-08-10 06:38:01,658 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2024-08-10 06:38:12,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=419680.0, ans=0.125 2024-08-10 06:38:27,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.73 vs. limit=15.0 2024-08-10 06:38:28,365 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13000, loss[loss=0.1362, beats_loss=0.008663, ecapa_loss=0.0003499, whisper_loss=0.1241, over 19972.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01239, ecapa_loss=0.0002899, whisper_loss=0.09704, over 3846580.05 frames. ], batch size: 77, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:38:45,816 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 3.317e+01 3.869e+01 4.527e+01 7.040e+01, threshold=7.738e+01, percent-clipped=0.0 2024-08-10 06:38:50,293 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 06:38:53,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=419980.0, ans=0.1 2024-08-10 06:39:42,148 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13050, loss[loss=0.1332, beats_loss=0.01076, ecapa_loss=0.0002868, whisper_loss=0.1196, over 18528.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01236, ecapa_loss=0.0002898, whisper_loss=0.09676, over 3833601.22 frames. ], batch size: 71, lr: 1.76e-02, grad_scale: 67108864.0 2024-08-10 06:39:44,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=420380.0, ans=0.125 2024-08-10 06:40:06,752 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 16 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-10 06:40:10,164 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=28.11 vs. limit=22.5 2024-08-10 06:40:19,648 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 06:40:20,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=420580.0, ans=0.125 2024-08-10 06:40:45,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=420780.0, ans=0.125 2024-08-10 06:40:48,755 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 06:40:56,586 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13100, loss[loss=0.1308, beats_loss=0.01233, ecapa_loss=0.000336, whisper_loss=0.1151, over 20364.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01238, ecapa_loss=0.0002886, whisper_loss=0.09718, over 3855457.39 frames. ], batch size: 84, lr: 1.76e-02, grad_scale: 67108864.0 2024-08-10 06:40:58,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=420880.0, ans=0.2 2024-08-10 06:41:03,299 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 06:41:05,367 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-08-10 06:41:14,685 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.548e+01 3.107e+01 3.501e+01 3.954e+01 7.732e+01, threshold=7.002e+01, percent-clipped=0.0 2024-08-10 06:41:17,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=420980.0, ans=0.2 2024-08-10 06:41:41,197 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 06:42:12,469 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13150, loss[loss=0.1077, beats_loss=0.01207, ecapa_loss=0.000318, whisper_loss=0.0924, over 14786.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.0123, ecapa_loss=0.0002905, whisper_loss=0.09723, over 3844801.18 frames. ], batch size: 60, lr: 1.76e-02, grad_scale: 67108864.0 2024-08-10 06:42:22,765 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2024-08-10 06:42:31,312 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 06:42:37,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=421480.0, ans=0.2 2024-08-10 06:43:08,593 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.25 vs. limit=22.5 2024-08-10 06:43:25,725 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13200, loss[loss=0.1253, beats_loss=0.009406, ecapa_loss=0.0002876, whisper_loss=0.113, over 20485.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01231, ecapa_loss=0.0002894, whisper_loss=0.09701, over 3846102.29 frames. ], batch size: 79, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:43:36,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=421880.0, ans=0.0 2024-08-10 06:43:37,676 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 06:43:42,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.426e+01 3.092e+01 3.479e+01 4.168e+01 6.203e+01, threshold=6.958e+01, percent-clipped=0.0 2024-08-10 06:43:46,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=421980.0, ans=15.0 2024-08-10 06:43:53,705 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 19 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-10 06:44:05,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=422080.0, ans=0.05 2024-08-10 06:44:20,357 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-10 06:44:26,078 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-10 06:44:27,024 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2024-08-10 06:44:27,737 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 06:44:41,697 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13250, loss[loss=0.1258, beats_loss=0.01266, ecapa_loss=0.0003362, whisper_loss=0.1098, over 23270.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01226, ecapa_loss=0.0002913, whisper_loss=0.09747, over 3856540.74 frames. ], batch size: 94, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:44:58,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=422480.0, ans=0.125 2024-08-10 06:45:18,910 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 06:45:27,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=422680.0, ans=0.1 2024-08-10 06:45:27,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=422680.0, ans=0.125 2024-08-10 06:45:31,802 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 15 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 06:45:56,303 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 06:45:57,463 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13300, loss[loss=0.1286, beats_loss=0.01118, ecapa_loss=0.0003465, whisper_loss=0.1139, over 22671.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01244, ecapa_loss=0.0002881, whisper_loss=0.0965, over 3861199.06 frames. ], batch size: 92, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:46:13,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=422980.0, ans=0.125 2024-08-10 06:46:15,273 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.602e+01 3.388e+01 3.671e+01 4.200e+01 6.497e+01, threshold=7.342e+01, percent-clipped=0.0 2024-08-10 06:46:17,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=422980.0, ans=0.2 2024-08-10 06:46:18,772 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.450e+00 2024-08-10 06:46:21,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=422980.0, ans=0.125 2024-08-10 06:46:22,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=422980.0, ans=0.125 2024-08-10 06:46:50,038 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.36 vs. limit=15.0 2024-08-10 06:46:56,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=423280.0, ans=0.2 2024-08-10 06:46:57,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.65 vs. limit=15.0 2024-08-10 06:46:58,053 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-10 06:47:08,120 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-10 06:47:10,675 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13350, loss[loss=0.1069, beats_loss=0.01147, ecapa_loss=0.0002905, whisper_loss=0.09249, over 22940.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01228, ecapa_loss=0.0002925, whisper_loss=0.09771, over 3856016.40 frames. ], batch size: 90, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:47:23,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=423380.0, ans=0.125 2024-08-10 06:47:25,016 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.00 vs. limit=15.0 2024-08-10 06:47:30,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=423480.0, ans=0.125 2024-08-10 06:47:35,126 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 06:47:37,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=423480.0, ans=0.0 2024-08-10 06:47:38,043 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2024-08-10 06:47:43,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=423580.0, ans=0.2 2024-08-10 06:47:45,174 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.315e+00 2024-08-10 06:47:51,277 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 14 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 06:48:10,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=423780.0, ans=0.0 2024-08-10 06:48:18,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=423780.0, ans=0.125 2024-08-10 06:48:24,493 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13400, loss[loss=0.08214, beats_loss=0.01183, ecapa_loss=0.0002696, whisper_loss=0.06761, over 15680.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01223, ecapa_loss=0.0002938, whisper_loss=0.09802, over 3846336.69 frames. ], batch size: 61, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:48:38,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=423980.0, ans=0.0 2024-08-10 06:48:42,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.346e+01 3.269e+01 3.722e+01 4.193e+01 5.690e+01, threshold=7.444e+01, percent-clipped=0.0 2024-08-10 06:48:49,422 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 06:48:57,252 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 06:48:58,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=424080.0, ans=0.0 2024-08-10 06:49:03,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=424080.0, ans=0.125 2024-08-10 06:49:07,470 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 06:49:08,728 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-10 06:49:10,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=424180.0, ans=0.125 2024-08-10 06:49:10,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-10 06:49:13,490 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 06:49:31,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=424280.0, ans=0.05 2024-08-10 06:49:38,665 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13450, loss[loss=0.1061, beats_loss=0.01518, ecapa_loss=0.000237, whisper_loss=0.08854, over 20734.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01235, ecapa_loss=0.0002924, whisper_loss=0.09677, over 3853921.69 frames. ], batch size: 80, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:49:40,562 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 22 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 06:49:40,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=424380.0, ans=0.2 2024-08-10 06:49:41,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2024-08-10 06:49:45,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=424380.0, ans=0.125 2024-08-10 06:49:57,685 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 06:50:16,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=424580.0, ans=0.2 2024-08-10 06:50:19,925 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.73 vs. limit=10.0 2024-08-10 06:50:20,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=424680.0, ans=0.125 2024-08-10 06:50:29,188 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 14 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 06:50:32,128 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-10 06:50:42,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=424780.0, ans=0.1 2024-08-10 06:50:45,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=424780.0, ans=0.5 2024-08-10 06:50:50,379 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13500, loss[loss=0.129, beats_loss=0.01461, ecapa_loss=0.0002738, whisper_loss=0.1116, over 21634.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01238, ecapa_loss=0.0002921, whisper_loss=0.09627, over 3830699.30 frames. ], batch size: 88, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:51:07,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.618e+01 3.316e+01 3.785e+01 4.530e+01 1.081e+02, threshold=7.570e+01, percent-clipped=1.0 2024-08-10 06:51:11,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=424980.0, ans=0.125 2024-08-10 06:51:21,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=425080.0, ans=0.125 2024-08-10 06:51:23,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=425080.0, ans=0.0 2024-08-10 06:51:24,026 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 06:51:26,903 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 12 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 06:51:27,389 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-08-10 06:51:29,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=425080.0, ans=0.1 2024-08-10 06:51:31,195 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 06:51:44,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=425180.0, ans=0.1 2024-08-10 06:51:44,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=425180.0, ans=0.1 2024-08-10 06:52:01,920 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13550, loss[loss=0.1011, beats_loss=0.01177, ecapa_loss=0.0003287, whisper_loss=0.08601, over 19591.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01241, ecapa_loss=0.0002908, whisper_loss=0.09644, over 3816887.92 frames. ], batch size: 82, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:52:22,553 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=12.0 2024-08-10 06:52:23,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=425480.0, ans=0.125 2024-08-10 06:52:32,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=15.0 2024-08-10 06:52:43,146 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 06:52:44,997 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2024-08-10 06:52:56,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=425680.0, ans=0.125 2024-08-10 06:53:02,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=425780.0, ans=0.5 2024-08-10 06:53:13,375 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13600, loss[loss=0.1338, beats_loss=0.01218, ecapa_loss=0.0002943, whisper_loss=0.1187, over 17965.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01259, ecapa_loss=0.0002881, whisper_loss=0.09594, over 3821333.73 frames. ], batch size: 71, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:53:24,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=425880.0, ans=0.125 2024-08-10 06:53:30,892 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 3.163e+01 3.442e+01 4.144e+01 6.667e+01, threshold=6.884e+01, percent-clipped=0.0 2024-08-10 06:53:34,379 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2024-08-10 06:53:55,190 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 06:54:05,105 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 06:54:14,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2024-08-10 06:54:24,376 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13650, loss[loss=0.1194, beats_loss=0.0101, ecapa_loss=0.0003251, whisper_loss=0.106, over 20154.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01264, ecapa_loss=0.0002893, whisper_loss=0.09589, over 3822049.80 frames. ], batch size: 79, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:54:31,412 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 19 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 06:54:38,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=426480.0, ans=0.0 2024-08-10 06:54:39,353 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.08 vs. limit=15.0 2024-08-10 06:54:41,188 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 06:54:42,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=426480.0, ans=0.125 2024-08-10 06:54:54,790 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 24 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-10 06:55:00,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=426580.0, ans=0.0 2024-08-10 06:55:05,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=426680.0, ans=0.125 2024-08-10 06:55:33,195 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13700, loss[loss=0.1504, beats_loss=0.01023, ecapa_loss=0.0003363, whisper_loss=0.1368, over 21913.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01251, ecapa_loss=0.0002919, whisper_loss=0.09717, over 3846529.56 frames. ], batch size: 86, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:55:33,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=426880.0, ans=0.125 2024-08-10 06:55:41,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=426880.0, ans=0.1 2024-08-10 06:55:48,617 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.09 vs. limit=15.0 2024-08-10 06:55:49,038 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.603e+01 3.221e+01 3.630e+01 4.052e+01 7.780e+01, threshold=7.261e+01, percent-clipped=2.0 2024-08-10 06:55:59,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=427080.0, ans=0.2 2024-08-10 06:56:04,965 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.46 vs. limit=10.0 2024-08-10 06:56:43,269 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13750, loss[loss=0.116, beats_loss=0.01106, ecapa_loss=0.0003484, whisper_loss=0.1015, over 16838.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01237, ecapa_loss=0.0002926, whisper_loss=0.09783, over 3844815.70 frames. ], batch size: 68, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:56:59,864 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=27.76 vs. limit=15.0 2024-08-10 06:57:07,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=427480.0, ans=0.125 2024-08-10 06:57:38,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=427780.0, ans=0.2 2024-08-10 06:57:53,090 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13800, loss[loss=0.126, beats_loss=0.008488, ecapa_loss=0.0002853, whisper_loss=0.1146, over 17524.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.0123, ecapa_loss=0.0002914, whisper_loss=0.09798, over 3857882.07 frames. ], batch size: 67, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:57:56,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=427880.0, ans=0.125 2024-08-10 06:58:02,447 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2024-08-10 06:58:07,813 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 06:58:10,198 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.463e+01 3.325e+01 3.732e+01 4.469e+01 6.721e+01, threshold=7.464e+01, percent-clipped=0.0 2024-08-10 06:58:19,431 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-10 06:58:42,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=428180.0, ans=0.125 2024-08-10 06:58:46,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=428180.0, ans=0.125 2024-08-10 06:58:53,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=428280.0, ans=0.1 2024-08-10 06:58:53,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=428280.0, ans=0.125 2024-08-10 06:59:02,315 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13850, loss[loss=0.1169, beats_loss=0.01321, ecapa_loss=0.0002676, whisper_loss=0.101, over 20892.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01226, ecapa_loss=0.0002898, whisper_loss=0.09864, over 3900851.62 frames. ], batch size: 84, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:59:05,764 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2024-08-10 06:59:23,000 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 06:59:23,570 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=12.0 2024-08-10 06:59:24,207 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 06:59:27,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=428480.0, ans=0.1 2024-08-10 06:59:36,220 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 06:59:44,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=428680.0, ans=0.125 2024-08-10 06:59:46,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=428680.0, ans=0.1 2024-08-10 06:59:55,814 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-10 07:00:09,979 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13900, loss[loss=0.1115, beats_loss=0.013, ecapa_loss=0.0003198, whisper_loss=0.0953, over 20045.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01236, ecapa_loss=0.000291, whisper_loss=0.0988, over 3902396.02 frames. ], batch size: 84, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:00:10,253 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 07:00:24,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=12.0 2024-08-10 07:00:26,508 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.310e+01 3.794e+01 4.612e+01 1.013e+02, threshold=7.587e+01, percent-clipped=2.0 2024-08-10 07:00:59,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=429180.0, ans=15.0 2024-08-10 07:01:04,685 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 07:01:07,371 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 15 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 07:01:18,167 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 13950, loss[loss=0.1041, beats_loss=0.01335, ecapa_loss=0.0002548, whisper_loss=0.08817, over 17556.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01233, ecapa_loss=0.0002907, whisper_loss=0.09873, over 3891088.55 frames. ], batch size: 71, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:01:20,440 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-10 07:01:23,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=429380.0, ans=0.125 2024-08-10 07:01:27,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=429380.0, ans=0.1 2024-08-10 07:01:33,802 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 07:02:12,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=429780.0, ans=0.0 2024-08-10 07:02:16,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=429780.0, ans=0.125 2024-08-10 07:02:17,397 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 07:02:18,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=429780.0, ans=0.2 2024-08-10 07:02:21,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=429780.0, ans=0.1 2024-08-10 07:02:26,609 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 14000, loss[loss=0.09711, beats_loss=0.01203, ecapa_loss=0.0003633, whisper_loss=0.08145, over 19804.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01238, ecapa_loss=0.0002889, whisper_loss=0.09861, over 3901999.43 frames. ], batch size: 85, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:02:43,018 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 3.368e+01 3.902e+01 4.630e+01 2.044e+02, threshold=7.804e+01, percent-clipped=2.0 2024-08-10 07:02:45,991 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:02:48,598 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 07:02:48,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=429980.0, ans=0.09899494936611666 2024-08-10 07:03:28,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=430280.0, ans=0.2 2024-08-10 07:03:35,077 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 14050, loss[loss=0.1021, beats_loss=0.01204, ecapa_loss=0.0002992, whisper_loss=0.08706, over 22034.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01237, ecapa_loss=0.0002875, whisper_loss=0.09842, over 3900046.59 frames. ], batch size: 91, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:03:49,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=430480.0, ans=0.125 2024-08-10 07:03:52,948 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2024-08-10 07:03:58,009 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 12 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-10 07:04:06,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=430580.0, ans=0.2 2024-08-10 07:04:09,631 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2024-08-10 07:04:21,249 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 07:04:30,201 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.44 vs. limit=12.0 2024-08-10 07:04:34,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=430780.0, ans=0.125 2024-08-10 07:04:34,481 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2024-08-10 07:04:34,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=430780.0, ans=15.0 2024-08-10 07:04:44,923 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 14100, loss[loss=0.1022, beats_loss=0.01386, ecapa_loss=0.0003198, whisper_loss=0.0851, over 15233.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01234, ecapa_loss=0.0002874, whisper_loss=0.09816, over 3897279.83 frames. ], batch size: 62, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:04:48,207 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.55 vs. limit=22.5 2024-08-10 07:05:00,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.555e+01 3.108e+01 3.411e+01 4.014e+01 7.175e+01, threshold=6.821e+01, percent-clipped=1.0 2024-08-10 07:05:06,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=430980.0, ans=0.125 2024-08-10 07:05:08,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=430980.0, ans=0.0 2024-08-10 07:05:18,502 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2024-08-10 07:05:33,115 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-10 07:05:39,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=431280.0, ans=0.125 2024-08-10 07:05:52,301 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 14150, loss[loss=0.08272, beats_loss=0.01617, ecapa_loss=0.000241, whisper_loss=0.06414, over 21021.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.0124, ecapa_loss=0.0002877, whisper_loss=0.09717, over 3927539.80 frames. ], batch size: 88, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:06:00,676 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 07:06:09,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431480.0, ans=0.1 2024-08-10 07:06:14,182 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=15.0 2024-08-10 07:06:22,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431580.0, ans=0.1 2024-08-10 07:06:29,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=431580.0, ans=0.125 2024-08-10 07:06:36,559 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-10 07:06:36,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=431680.0, ans=0.0 2024-08-10 07:06:39,411 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 31 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 07:06:57,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=431780.0, ans=0.125 2024-08-10 07:07:01,269 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 14200, loss[loss=0.115, beats_loss=0.01258, ecapa_loss=0.000279, whisper_loss=0.09966, over 22296.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01244, ecapa_loss=0.0002853, whisper_loss=0.09714, over 3909013.27 frames. ], batch size: 91, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:07:01,497 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-10 07:07:14,292 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 07:07:18,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.671e+01 3.227e+01 3.786e+01 4.277e+01 7.139e+01, threshold=7.572e+01, percent-clipped=1.0 2024-08-10 07:07:21,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=431980.0, ans=0.125 2024-08-10 07:07:39,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=432080.0, ans=0.125 2024-08-10 07:07:59,353 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.433e+01 2024-08-10 07:08:02,517 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.33 vs. limit=15.0 2024-08-10 07:08:11,368 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 14250, loss[loss=0.0879, beats_loss=0.01203, ecapa_loss=0.0002942, whisper_loss=0.07293, over 16305.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01237, ecapa_loss=0.0002862, whisper_loss=0.09753, over 3903920.05 frames. ], batch size: 64, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:08:25,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=432480.0, ans=0.1 2024-08-10 07:08:51,143 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.78 vs. limit=15.0 2024-08-10 07:08:54,726 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 07:09:00,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=432680.0, ans=0.0 2024-08-10 07:09:13,857 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 07:09:15,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=432780.0, ans=0.05 2024-08-10 07:09:16,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=432780.0, ans=0.125 2024-08-10 07:09:20,227 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 14300, loss[loss=0.131, beats_loss=0.0106, ecapa_loss=0.0002726, whisper_loss=0.1177, over 22228.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01237, ecapa_loss=0.0002859, whisper_loss=0.09748, over 3889901.49 frames. ], batch size: 87, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:09:32,724 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 38 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 07:09:34,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=432980.0, ans=0.125 2024-08-10 07:09:35,613 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 07:09:36,815 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 3.216e+01 3.597e+01 4.195e+01 6.015e+01, threshold=7.194e+01, percent-clipped=0.0 2024-08-10 07:09:53,760 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2024-08-10 07:10:09,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433180.0, ans=0.1 2024-08-10 07:10:14,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=433280.0, ans=0.125 2024-08-10 07:10:15,485 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 33 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 07:10:18,188 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 42 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-10 07:10:19,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=433280.0, ans=0.0 2024-08-10 07:10:21,006 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 07:10:28,803 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 14350, loss[loss=0.08577, beats_loss=0.01268, ecapa_loss=0.0002291, whisper_loss=0.0708, over 14974.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01235, ecapa_loss=0.0002854, whisper_loss=0.09769, over 3875913.62 frames. ], batch size: 57, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:10:31,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=433380.0, ans=0.125 2024-08-10 07:10:35,625 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 07:10:41,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433480.0, ans=0.1 2024-08-10 07:10:46,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=433480.0, ans=0.1 2024-08-10 07:10:50,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=433480.0, ans=0.0 2024-08-10 07:10:56,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=433580.0, ans=0.125 2024-08-10 07:10:56,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=433580.0, ans=0.0 2024-08-10 07:11:07,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=433580.0, ans=0.07 2024-08-10 07:11:10,759 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.98 vs. limit=22.5 2024-08-10 07:11:30,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=433780.0, ans=0.125 2024-08-10 07:11:36,468 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 14400, loss[loss=0.1085, beats_loss=0.01456, ecapa_loss=0.0002734, whisper_loss=0.09118, over 23048.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01244, ecapa_loss=0.0002845, whisper_loss=0.09744, over 3889226.76 frames. ], batch size: 93, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:11:42,730 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 07:11:50,992 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 07:11:53,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 3.382e+01 3.755e+01 4.286e+01 6.808e+01, threshold=7.511e+01, percent-clipped=0.0 2024-08-10 07:12:02,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=433980.0, ans=0.125 2024-08-10 07:12:11,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=434080.0, ans=0.125 2024-08-10 07:12:12,543 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 07:12:18,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=434180.0, ans=0.0 2024-08-10 07:12:27,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=434180.0, ans=0.125 2024-08-10 07:12:35,738 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 07:12:37,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=434280.0, ans=0.1 2024-08-10 07:12:45,104 INFO [train_multi_KD3.py:1116] (1/4) Epoch 3, batch 14450, loss[loss=0.11, beats_loss=0.01192, ecapa_loss=0.0003482, whisper_loss=0.09456, over 19052.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01251, ecapa_loss=0.000284, whisper_loss=0.09718, over 3891590.35 frames. ], batch size: 78, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:13:00,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=434480.0, ans=0.2 2024-08-10 07:13:16,020 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.37 vs. limit=15.0 2024-08-10 07:13:18,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=434580.0, ans=0.125 2024-08-10 07:14:14,462 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 0, loss[loss=0.09122, beats_loss=0.01326, ecapa_loss=0.0003553, whisper_loss=0.07441, over 19807.00 frames. ], tot_loss[loss=0.09122, beats_loss=0.01326, ecapa_loss=0.0003553, whisper_loss=0.07441, over 19807.00 frames. ], batch size: 82, lr: 1.62e-02, grad_scale: 67108864.0 2024-08-10 07:14:14,463 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 07:14:55,855 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on ASR_libri: loss=0.268, beats_loss=0, ecapa_loss=0.0008857, whisper_loss=0.2592, over 922467.00 frames. 2024-08-10 07:15:10,801 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on SV_voxceleb1: loss=0.007801, beats_loss=0, ecapa_loss=0.0007801, whisper_loss=0, over 939242.00 frames. 2024-08-10 07:15:39,011 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([0.0147, 0.0219, 0.0057, 1.5992, 0.0095, 0.0345, 0.0470, 0.0242], device='cuda:1') 2024-08-10 07:17:09,535 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on AT_audioset: loss=0.02834, beats_loss=0.02834, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 07:17:09,538 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 07:17:24,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=434770.0, ans=0.2 2024-08-10 07:18:01,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=434870.0, ans=0.125 2024-08-10 07:18:10,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=434970.0, ans=0.0 2024-08-10 07:18:11,122 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 07:18:12,754 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+01 3.318e+01 3.888e+01 4.583e+01 8.270e+01, threshold=7.777e+01, percent-clipped=1.0 2024-08-10 07:18:15,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=15.0 2024-08-10 07:18:41,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=435070.0, ans=0.125 2024-08-10 07:19:08,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=435170.0, ans=0.125 2024-08-10 07:19:18,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=435270.0, ans=0.0 2024-08-10 07:19:19,971 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 50, loss[loss=0.1049, beats_loss=0.01019, ecapa_loss=0.0002957, whisper_loss=0.09172, over 15999.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01221, ecapa_loss=0.0002931, whisper_loss=0.09639, over 869900.14 frames. ], batch size: 62, lr: 1.62e-02, grad_scale: 67108864.0 2024-08-10 07:19:46,549 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 07:19:58,784 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 33 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 07:20:01,325 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2024-08-10 07:20:13,327 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 19 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 07:20:23,527 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 07:20:43,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=435570.0, ans=0.2 2024-08-10 07:20:47,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=435570.0, ans=0.125 2024-08-10 07:21:15,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=435670.0, ans=0.035 2024-08-10 07:21:21,081 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 100, loss[loss=0.1119, beats_loss=0.009507, ecapa_loss=0.0002872, whisper_loss=0.09953, over 17570.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01211, ecapa_loss=0.0002896, whisper_loss=0.09591, over 1519953.81 frames. ], batch size: 70, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:21:22,406 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:21:46,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=435870.0, ans=0.0 2024-08-10 07:21:49,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=435870.0, ans=0.2 2024-08-10 07:21:53,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=435870.0, ans=0.0 2024-08-10 07:22:14,568 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.841e+01 3.372e+01 3.715e+01 4.340e+01 6.479e+01, threshold=7.429e+01, percent-clipped=0.0 2024-08-10 07:22:22,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=435970.0, ans=0.0 2024-08-10 07:22:25,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435970.0, ans=0.1 2024-08-10 07:22:26,788 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 07:22:30,354 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2024-08-10 07:22:38,426 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 07:22:49,444 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-10 07:22:51,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=436170.0, ans=0.125 2024-08-10 07:23:14,248 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 150, loss[loss=0.08514, beats_loss=0.0127, ecapa_loss=0.0002609, whisper_loss=0.06983, over 16674.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01198, ecapa_loss=0.0002857, whisper_loss=0.09606, over 2015521.04 frames. ], batch size: 67, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:23:23,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=436270.0, ans=0.125 2024-08-10 07:23:31,143 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.14 vs. limit=8.0 2024-08-10 07:23:35,338 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-10 07:23:41,059 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=12.0 2024-08-10 07:24:03,912 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 07:24:05,919 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 07:24:07,676 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 07:24:27,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=436670.0, ans=0.0 2024-08-10 07:24:38,883 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 200, loss[loss=0.1295, beats_loss=0.01006, ecapa_loss=0.0003557, whisper_loss=0.1159, over 20996.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01183, ecapa_loss=0.0002859, whisper_loss=0.09768, over 2425235.88 frames. ], batch size: 84, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:24:41,221 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 07:24:51,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=436770.0, ans=0.1 2024-08-10 07:25:14,586 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.684e+01 3.326e+01 3.682e+01 4.488e+01 7.047e+01, threshold=7.364e+01, percent-clipped=0.0 2024-08-10 07:25:14,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=436970.0, ans=0.0 2024-08-10 07:25:27,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=437070.0, ans=0.125 2024-08-10 07:25:42,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=437170.0, ans=0.125 2024-08-10 07:25:57,664 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 250, loss[loss=0.1262, beats_loss=0.01069, ecapa_loss=0.000296, whisper_loss=0.1125, over 16308.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01191, ecapa_loss=0.0002849, whisper_loss=0.09799, over 2739301.89 frames. ], batch size: 62, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:26:13,252 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 07:26:16,071 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-10 07:26:16,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=437370.0, ans=0.0 2024-08-10 07:26:44,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=437570.0, ans=0.125 2024-08-10 07:26:52,580 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 07:27:02,591 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 07:27:07,132 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 07:27:12,286 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.90 vs. limit=22.5 2024-08-10 07:27:12,781 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 300, loss[loss=0.118, beats_loss=0.01139, ecapa_loss=0.0003897, whisper_loss=0.1027, over 20209.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01207, ecapa_loss=0.0002814, whisper_loss=0.09631, over 2972622.74 frames. ], batch size: 87, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:27:23,293 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 07:27:35,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=437870.0, ans=0.125 2024-08-10 07:27:41,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437970.0, ans=0.1 2024-08-10 07:27:43,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=437970.0, ans=0.2 2024-08-10 07:27:43,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=437970.0, ans=0.0 2024-08-10 07:27:46,508 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 3.173e+01 3.597e+01 4.305e+01 6.522e+01, threshold=7.194e+01, percent-clipped=0.0 2024-08-10 07:27:49,827 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 07:27:53,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=437970.0, ans=0.04949747468305833 2024-08-10 07:28:00,236 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 07:28:07,932 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-10 07:28:23,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=438170.0, ans=0.125 2024-08-10 07:28:27,305 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 350, loss[loss=0.146, beats_loss=0.009414, ecapa_loss=0.000309, whisper_loss=0.1334, over 19229.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01221, ecapa_loss=0.0002755, whisper_loss=0.09606, over 3169624.86 frames. ], batch size: 75, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:28:29,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=438270.0, ans=0.1 2024-08-10 07:28:37,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=438270.0, ans=0.125 2024-08-10 07:28:39,683 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=22.5 2024-08-10 07:29:00,788 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 27 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 07:29:42,768 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 400, loss[loss=0.1044, beats_loss=0.01137, ecapa_loss=0.0003341, whisper_loss=0.08971, over 21080.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01221, ecapa_loss=0.0002748, whisper_loss=0.09577, over 3328080.71 frames. ], batch size: 85, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:30:04,423 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 18 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-10 07:30:08,403 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 07:30:16,557 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+01 3.285e+01 3.710e+01 4.185e+01 8.184e+01, threshold=7.420e+01, percent-clipped=1.0 2024-08-10 07:30:37,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=439070.0, ans=0.125 2024-08-10 07:30:44,694 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 07:30:54,494 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 450, loss[loss=0.1096, beats_loss=0.01292, ecapa_loss=0.0002824, whisper_loss=0.09388, over 22136.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01212, ecapa_loss=0.0002742, whisper_loss=0.09577, over 3428375.19 frames. ], batch size: 87, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:31:09,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=439370.0, ans=0.125 2024-08-10 07:31:21,810 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 07:31:23,236 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 07:31:39,590 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2024-08-10 07:31:55,816 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-10 07:32:00,858 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 500, loss[loss=0.09891, beats_loss=0.01434, ecapa_loss=0.00032, whisper_loss=0.08137, over 21594.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01203, ecapa_loss=0.0002749, whisper_loss=0.09539, over 3492445.62 frames. ], batch size: 93, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:32:13,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=439870.0, ans=0.125 2024-08-10 07:32:18,692 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2024-08-10 07:32:33,889 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.388e+01 2.971e+01 3.310e+01 3.858e+01 7.927e+01, threshold=6.621e+01, percent-clipped=1.0 2024-08-10 07:32:35,478 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 19 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-10 07:32:47,861 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=15.0 2024-08-10 07:32:51,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=440070.0, ans=0.125 2024-08-10 07:32:51,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=440070.0, ans=0.125 2024-08-10 07:32:55,288 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 07:32:56,615 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 07:33:05,844 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-10 07:33:09,373 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 550, loss[loss=0.08266, beats_loss=0.008939, ecapa_loss=0.0003033, whisper_loss=0.07069, over 14479.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01206, ecapa_loss=0.0002715, whisper_loss=0.09536, over 3582922.31 frames. ], batch size: 57, lr: 1.61e-02, grad_scale: 134217728.0 2024-08-10 07:33:12,098 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 36 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 07:33:24,958 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2024-08-10 07:33:31,822 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 07:33:33,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=440370.0, ans=0.0 2024-08-10 07:33:37,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=440470.0, ans=0.07 2024-08-10 07:33:47,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=440470.0, ans=0.0 2024-08-10 07:33:56,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=440570.0, ans=0.2 2024-08-10 07:34:01,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=440670.0, ans=0.125 2024-08-10 07:34:14,929 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 600, loss[loss=0.143, beats_loss=0.007739, ecapa_loss=0.0002634, whisper_loss=0.1326, over 16495.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01199, ecapa_loss=0.0002693, whisper_loss=0.0967, over 3657800.22 frames. ], batch size: 60, lr: 1.61e-02, grad_scale: 134217728.0 2024-08-10 07:34:15,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=440770.0, ans=0.125 2024-08-10 07:34:26,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=440770.0, ans=0.07 2024-08-10 07:34:31,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=440870.0, ans=0.125 2024-08-10 07:34:45,093 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.403e+01 3.004e+01 3.329e+01 3.797e+01 6.092e+01, threshold=6.657e+01, percent-clipped=0.0 2024-08-10 07:34:49,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=440970.0, ans=0.125 2024-08-10 07:34:54,562 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 07:34:54,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=441070.0, ans=0.125 2024-08-10 07:34:58,618 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-10 07:35:01,176 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 07:35:20,234 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 650, loss[loss=0.1128, beats_loss=0.01268, ecapa_loss=0.0003133, whisper_loss=0.09699, over 21282.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01208, ecapa_loss=0.0002672, whisper_loss=0.09607, over 3661583.70 frames. ], batch size: 89, lr: 1.61e-02, grad_scale: 134217728.0 2024-08-10 07:35:37,347 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 07:35:45,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=441470.0, ans=0.125 2024-08-10 07:36:05,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=441570.0, ans=10.0 2024-08-10 07:36:12,940 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 07:36:13,885 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2024-08-10 07:36:14,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=441670.0, ans=0.125 2024-08-10 07:36:18,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=441670.0, ans=0.0 2024-08-10 07:36:26,535 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 700, loss[loss=0.1204, beats_loss=0.01316, ecapa_loss=0.000229, whisper_loss=0.105, over 24505.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.012, ecapa_loss=0.0002687, whisper_loss=0.09689, over 3682736.77 frames. ], batch size: 94, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:36:51,607 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-10 07:36:53,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=441970.0, ans=0.125 2024-08-10 07:36:55,286 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-10 07:36:56,485 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+01 3.137e+01 3.551e+01 4.143e+01 1.211e+02, threshold=7.103e+01, percent-clipped=4.0 2024-08-10 07:37:06,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442070.0, ans=0.1 2024-08-10 07:37:10,080 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 07:37:16,541 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 07:37:16,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=442070.0, ans=0.0 2024-08-10 07:37:32,329 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 750, loss[loss=0.1046, beats_loss=0.009251, ecapa_loss=0.0003227, whisper_loss=0.09214, over 14643.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01197, ecapa_loss=0.0002698, whisper_loss=0.0975, over 3682659.62 frames. ], batch size: 60, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:37:32,518 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 07:37:36,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=442270.0, ans=0.125 2024-08-10 07:37:40,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=442270.0, ans=12.0 2024-08-10 07:37:42,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=442270.0, ans=0.125 2024-08-10 07:37:53,358 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-10 07:38:13,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=442570.0, ans=0.125 2024-08-10 07:38:18,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=442570.0, ans=0.125 2024-08-10 07:38:19,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=442570.0, ans=0.5 2024-08-10 07:38:28,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=442670.0, ans=0.125 2024-08-10 07:38:30,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=442670.0, ans=0.1 2024-08-10 07:38:37,365 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 800, loss[loss=0.121, beats_loss=0.01113, ecapa_loss=0.0002282, whisper_loss=0.1076, over 19276.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01205, ecapa_loss=0.0002685, whisper_loss=0.0964, over 3739247.14 frames. ], batch size: 71, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:39:07,624 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.441e+01 2.938e+01 3.331e+01 3.852e+01 7.963e+01, threshold=6.661e+01, percent-clipped=1.0 2024-08-10 07:39:12,995 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 07:39:14,832 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.32 vs. limit=15.0 2024-08-10 07:39:20,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=443070.0, ans=0.025 2024-08-10 07:39:40,309 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 07:39:42,990 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 850, loss[loss=0.1178, beats_loss=0.01138, ecapa_loss=0.0003005, whisper_loss=0.1034, over 22607.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01202, ecapa_loss=0.0002687, whisper_loss=0.09567, over 3721499.19 frames. ], batch size: 91, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:39:47,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=443270.0, ans=0.07 2024-08-10 07:39:55,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=443370.0, ans=0.125 2024-08-10 07:40:00,162 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 16 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 07:40:04,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=443370.0, ans=0.0 2024-08-10 07:40:11,307 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2024-08-10 07:40:14,765 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 07:40:22,778 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 13 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 07:40:24,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=443570.0, ans=0.04949747468305833 2024-08-10 07:40:27,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=443570.0, ans=0.125 2024-08-10 07:40:32,040 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 07:40:32,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=443570.0, ans=0.0 2024-08-10 07:40:48,648 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 900, loss[loss=0.08459, beats_loss=0.01427, ecapa_loss=0.0002508, whisper_loss=0.06782, over 16866.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01212, ecapa_loss=0.0002692, whisper_loss=0.09557, over 3768108.84 frames. ], batch size: 68, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:40:50,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=443770.0, ans=0.125 2024-08-10 07:40:54,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=443770.0, ans=0.2 2024-08-10 07:41:17,666 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 13 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 07:41:18,846 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 3.112e+01 3.456e+01 3.897e+01 5.995e+01, threshold=6.912e+01, percent-clipped=0.0 2024-08-10 07:41:24,130 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 07:41:26,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=444070.0, ans=0.125 2024-08-10 07:41:49,329 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=15.0 2024-08-10 07:41:53,726 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 950, loss[loss=0.1139, beats_loss=0.01146, ecapa_loss=0.0002475, whisper_loss=0.1, over 15195.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01217, ecapa_loss=0.000268, whisper_loss=0.09574, over 3791652.27 frames. ], batch size: 57, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:42:26,359 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 07:42:54,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=444670.0, ans=0.2 2024-08-10 07:42:55,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2024-08-10 07:42:59,179 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1000, loss[loss=0.09106, beats_loss=0.01248, ecapa_loss=0.0002832, whisper_loss=0.07574, over 13827.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.0122, ecapa_loss=0.000266, whisper_loss=0.09561, over 3821999.15 frames. ], batch size: 55, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:43:15,013 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 07:43:24,031 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 07:43:25,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=444970.0, ans=0.125 2024-08-10 07:43:26,371 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-08-10 07:43:29,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 3.226e+01 3.648e+01 4.312e+01 7.271e+01, threshold=7.295e+01, percent-clipped=2.0 2024-08-10 07:43:32,364 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 41 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 07:43:46,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=445070.0, ans=0.0 2024-08-10 07:43:49,319 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 07:44:04,831 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1050, loss[loss=0.121, beats_loss=0.01095, ecapa_loss=0.0002712, whisper_loss=0.1073, over 15194.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01216, ecapa_loss=0.0002664, whisper_loss=0.09602, over 3837976.77 frames. ], batch size: 57, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:44:06,294 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 21 from LS+wenet, 7 from Vox, 28 fro AS 2024-08-10 07:44:10,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=445270.0, ans=0.125 2024-08-10 07:44:14,340 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 07:44:23,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=445370.0, ans=0.95 2024-08-10 07:44:33,040 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.00 vs. limit=15.0 2024-08-10 07:44:39,999 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 07:44:45,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445570.0, ans=0.1 2024-08-10 07:44:59,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=445670.0, ans=0.2 2024-08-10 07:45:09,872 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1100, loss[loss=0.1145, beats_loss=0.01146, ecapa_loss=0.000293, whisper_loss=0.1001, over 16551.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01215, ecapa_loss=0.0002651, whisper_loss=0.09651, over 3836332.62 frames. ], batch size: 66, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:45:10,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=445770.0, ans=0.025 2024-08-10 07:45:21,363 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 17 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 07:45:24,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=445870.0, ans=0.1 2024-08-10 07:45:30,527 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-10 07:45:32,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=445870.0, ans=0.0 2024-08-10 07:45:35,667 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 34 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-10 07:45:35,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=445970.0, ans=0.2 2024-08-10 07:45:39,652 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 3.161e+01 3.477e+01 3.934e+01 8.780e+01, threshold=6.953e+01, percent-clipped=2.0 2024-08-10 07:45:47,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=446070.0, ans=0.125 2024-08-10 07:45:57,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=446070.0, ans=0.125 2024-08-10 07:46:13,436 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 07:46:14,832 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1150, loss[loss=0.1183, beats_loss=0.00991, ecapa_loss=0.0002912, whisper_loss=0.1055, over 22060.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01205, ecapa_loss=0.0002661, whisper_loss=0.09682, over 3847624.40 frames. ], batch size: 87, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:46:29,071 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 07:46:30,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446370.0, ans=0.1 2024-08-10 07:46:43,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446470.0, ans=0.1 2024-08-10 07:46:50,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=446470.0, ans=0.0 2024-08-10 07:46:51,918 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.02 vs. limit=22.5 2024-08-10 07:46:58,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=446570.0, ans=0.0 2024-08-10 07:47:06,170 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-10 07:47:10,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=446670.0, ans=0.2 2024-08-10 07:47:17,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=446670.0, ans=0.0 2024-08-10 07:47:20,440 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1200, loss[loss=0.1239, beats_loss=0.009955, ecapa_loss=0.0002774, whisper_loss=0.1112, over 21688.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01209, ecapa_loss=0.0002651, whisper_loss=0.09651, over 3832092.93 frames. ], batch size: 87, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:47:30,925 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2024-08-10 07:47:50,946 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 3.044e+01 3.412e+01 3.944e+01 6.015e+01, threshold=6.823e+01, percent-clipped=0.0 2024-08-10 07:48:04,672 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 07:48:08,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447070.0, ans=0.1 2024-08-10 07:48:08,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=447070.0, ans=0.125 2024-08-10 07:48:19,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=447170.0, ans=0.125 2024-08-10 07:48:24,506 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 07:48:28,399 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1250, loss[loss=0.1107, beats_loss=0.0111, ecapa_loss=0.0003535, whisper_loss=0.09608, over 20180.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01216, ecapa_loss=0.0002639, whisper_loss=0.0963, over 3825787.02 frames. ], batch size: 86, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:48:33,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=447270.0, ans=0.07 2024-08-10 07:48:40,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=447270.0, ans=0.125 2024-08-10 07:48:43,578 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2024-08-10 07:48:47,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=447370.0, ans=0.0 2024-08-10 07:48:47,533 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.23 vs. limit=10.0 2024-08-10 07:48:50,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=447370.0, ans=0.05 2024-08-10 07:48:54,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=447370.0, ans=0.0 2024-08-10 07:48:57,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=447470.0, ans=0.1 2024-08-10 07:48:58,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=447470.0, ans=0.125 2024-08-10 07:49:18,919 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 07:49:23,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=447570.0, ans=0.0 2024-08-10 07:49:27,244 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 07:49:34,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=447670.0, ans=0.125 2024-08-10 07:49:39,971 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1300, loss[loss=0.1357, beats_loss=0.01059, ecapa_loss=0.0002264, whisper_loss=0.1228, over 19000.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01216, ecapa_loss=0.0002621, whisper_loss=0.09604, over 3829027.43 frames. ], batch size: 70, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:49:41,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=447770.0, ans=0.1 2024-08-10 07:49:43,253 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.134e-02 2024-08-10 07:50:03,061 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.38 vs. limit=22.5 2024-08-10 07:50:12,274 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 3.001e+01 3.337e+01 3.796e+01 6.277e+01, threshold=6.674e+01, percent-clipped=0.0 2024-08-10 07:50:19,277 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 07:50:26,770 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 07:50:30,362 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.09 vs. limit=10.0 2024-08-10 07:50:46,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=448170.0, ans=0.0 2024-08-10 07:50:51,260 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1350, loss[loss=0.1023, beats_loss=0.01268, ecapa_loss=0.0002389, whisper_loss=0.08728, over 22145.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01212, ecapa_loss=0.0002621, whisper_loss=0.09713, over 3848091.19 frames. ], batch size: 89, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:50:55,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=448270.0, ans=0.125 2024-08-10 07:51:11,066 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2024-08-10 07:51:14,921 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-10 07:51:18,102 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 07:51:24,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=448470.0, ans=0.125 2024-08-10 07:51:30,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=448470.0, ans=0.125 2024-08-10 07:51:31,586 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 19 from Vox, 53 fro AS 2024-08-10 07:51:35,423 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 07:51:42,950 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 07:51:46,098 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-10 07:52:03,524 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1400, loss[loss=0.1175, beats_loss=0.01322, ecapa_loss=0.0002593, whisper_loss=0.1017, over 22942.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01212, ecapa_loss=0.0002611, whisper_loss=0.09684, over 3836875.89 frames. ], batch size: 93, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:52:20,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448870.0, ans=0.1 2024-08-10 07:52:34,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=448970.0, ans=0.125 2024-08-10 07:52:37,485 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 2.977e+01 3.358e+01 3.939e+01 6.744e+01, threshold=6.717e+01, percent-clipped=2.0 2024-08-10 07:52:42,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=448970.0, ans=0.0 2024-08-10 07:52:45,990 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=8.0 2024-08-10 07:52:48,872 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.09 vs. limit=10.0 2024-08-10 07:52:49,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=449070.0, ans=0.125 2024-08-10 07:52:56,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449070.0, ans=0.1 2024-08-10 07:53:17,573 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1450, loss[loss=0.1328, beats_loss=0.01001, ecapa_loss=0.0002161, whisper_loss=0.1207, over 18039.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01221, ecapa_loss=0.0002608, whisper_loss=0.09643, over 3848550.57 frames. ], batch size: 66, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:54:00,782 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2024-08-10 07:54:34,746 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 07:55:00,493 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1500, loss[loss=0.1017, beats_loss=0.01337, ecapa_loss=0.0002551, whisper_loss=0.08574, over 16601.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01228, ecapa_loss=0.0002606, whisper_loss=0.09572, over 3854693.12 frames. ], batch size: 67, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:55:35,946 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+01 2.938e+01 3.327e+01 3.975e+01 6.102e+01, threshold=6.654e+01, percent-clipped=0.0 2024-08-10 07:55:37,536 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 07:55:39,311 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 07:56:16,680 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1550, loss[loss=0.1008, beats_loss=0.01134, ecapa_loss=0.000271, whisper_loss=0.08671, over 23159.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01217, ecapa_loss=0.0002605, whisper_loss=0.09602, over 3828309.45 frames. ], batch size: 91, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:56:16,848 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 07:56:26,327 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 07:56:54,168 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.16 vs. limit=10.0 2024-08-10 07:57:08,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=450570.0, ans=0.2 2024-08-10 07:57:32,244 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1600, loss[loss=0.1074, beats_loss=0.01127, ecapa_loss=0.0002219, whisper_loss=0.09387, over 19319.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01217, ecapa_loss=0.0002596, whisper_loss=0.09617, over 3839299.82 frames. ], batch size: 72, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:57:33,837 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 07:57:38,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=450770.0, ans=0.125 2024-08-10 07:57:41,075 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 07:58:07,097 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 3.094e+01 3.435e+01 3.999e+01 7.884e+01, threshold=6.871e+01, percent-clipped=1.0 2024-08-10 07:58:12,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=450970.0, ans=10.0 2024-08-10 07:58:13,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=450970.0, ans=0.125 2024-08-10 07:58:17,709 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2024-08-10 07:58:25,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451070.0, ans=0.1 2024-08-10 07:58:41,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=451170.0, ans=0.125 2024-08-10 07:58:46,851 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1650, loss[loss=0.1023, beats_loss=0.009959, ecapa_loss=0.0002483, whisper_loss=0.08986, over 17065.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01206, ecapa_loss=0.0002579, whisper_loss=0.09673, over 3855218.16 frames. ], batch size: 63, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:58:48,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=451270.0, ans=0.125 2024-08-10 07:59:03,555 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:59:08,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=451370.0, ans=0.125 2024-08-10 07:59:09,322 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 22 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-10 07:59:28,036 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-10 07:59:40,412 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 07:59:50,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.90 vs. limit=22.5 2024-08-10 07:59:52,834 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.50 vs. limit=10.0 2024-08-10 07:59:56,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=451670.0, ans=0.125 2024-08-10 07:59:58,984 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1700, loss[loss=0.1096, beats_loss=0.0108, ecapa_loss=0.0002738, whisper_loss=0.09608, over 23458.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01201, ecapa_loss=0.0002579, whisper_loss=0.09712, over 3851901.94 frames. ], batch size: 93, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 08:00:25,271 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.27 vs. limit=15.0 2024-08-10 08:00:31,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+01 3.130e+01 3.389e+01 3.948e+01 7.641e+01, threshold=6.778e+01, percent-clipped=2.0 2024-08-10 08:01:03,915 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 08:01:08,910 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1750, loss[loss=0.1281, beats_loss=0.01161, ecapa_loss=0.0002554, whisper_loss=0.114, over 22927.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01211, ecapa_loss=0.0002571, whisper_loss=0.09705, over 3865958.27 frames. ], batch size: 89, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 08:01:31,408 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 16 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 08:01:39,195 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 08:01:50,161 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 08:01:50,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=452570.0, ans=0.0 2024-08-10 08:01:56,664 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2024-08-10 08:01:57,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452570.0, ans=0.1 2024-08-10 08:02:18,112 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1800, loss[loss=0.1243, beats_loss=0.01039, ecapa_loss=0.0002498, whisper_loss=0.1114, over 15171.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01198, ecapa_loss=0.0002595, whisper_loss=0.09735, over 3845442.97 frames. ], batch size: 56, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 08:02:49,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+01 3.196e+01 3.582e+01 4.110e+01 5.783e+01, threshold=7.164e+01, percent-clipped=0.0 2024-08-10 08:03:08,237 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 08:03:26,453 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1850, loss[loss=0.08998, beats_loss=0.01218, ecapa_loss=0.0002628, whisper_loss=0.07518, over 15917.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01207, ecapa_loss=0.0002589, whisper_loss=0.09723, over 3840523.30 frames. ], batch size: 63, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:03:35,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=453270.0, ans=0.015 2024-08-10 08:04:08,520 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 08:04:19,749 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 08:04:23,596 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 08:04:39,091 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1900, loss[loss=0.09395, beats_loss=0.01178, ecapa_loss=0.0003119, whisper_loss=0.07905, over 16774.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01205, ecapa_loss=0.0002643, whisper_loss=0.09702, over 3811845.16 frames. ], batch size: 66, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:04:47,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=453770.0, ans=0.125 2024-08-10 08:04:52,340 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 08:04:52,948 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2024-08-10 08:05:10,678 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 3.027e+01 3.393e+01 3.845e+01 7.336e+01, threshold=6.786e+01, percent-clipped=1.0 2024-08-10 08:05:16,812 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 08:05:18,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=453970.0, ans=0.125 2024-08-10 08:05:34,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=454170.0, ans=0.125 2024-08-10 08:05:44,540 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.41 vs. limit=12.0 2024-08-10 08:05:45,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=454170.0, ans=0.0 2024-08-10 08:05:48,093 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 08:05:49,246 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 1950, loss[loss=0.1066, beats_loss=0.01201, ecapa_loss=0.000329, whisper_loss=0.09134, over 13816.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01199, ecapa_loss=0.0002695, whisper_loss=0.09729, over 3805470.54 frames. ], batch size: 57, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:05:54,371 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 08:05:59,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=454270.0, ans=0.0 2024-08-10 08:06:05,730 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 22 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-10 08:06:07,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=454370.0, ans=0.125 2024-08-10 08:06:24,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=454470.0, ans=0.125 2024-08-10 08:06:39,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=454570.0, ans=0.0 2024-08-10 08:06:47,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454670.0, ans=0.1 2024-08-10 08:06:55,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=454670.0, ans=0.0 2024-08-10 08:07:00,758 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2000, loss[loss=0.105, beats_loss=0.01207, ecapa_loss=0.0003046, whisper_loss=0.0899, over 15860.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01205, ecapa_loss=0.0002714, whisper_loss=0.09684, over 3812416.69 frames. ], batch size: 62, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:07:05,404 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 08:07:34,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.624e+01 3.304e+01 3.702e+01 4.234e+01 5.771e+01, threshold=7.405e+01, percent-clipped=0.0 2024-08-10 08:08:01,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=455170.0, ans=0.1 2024-08-10 08:08:04,574 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 08:08:13,075 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2050, loss[loss=0.1103, beats_loss=0.01241, ecapa_loss=0.0003053, whisper_loss=0.09481, over 22552.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01205, ecapa_loss=0.0002713, whisper_loss=0.0972, over 3840871.72 frames. ], batch size: 91, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:08:17,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=455270.0, ans=0.125 2024-08-10 08:08:31,259 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 08:08:36,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=455370.0, ans=0.09899494936611666 2024-08-10 08:08:40,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=455470.0, ans=0.125 2024-08-10 08:08:50,017 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.85 vs. limit=15.0 2024-08-10 08:08:55,148 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 08:09:14,767 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 08:09:24,121 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2100, loss[loss=0.1078, beats_loss=0.01302, ecapa_loss=0.0002687, whisper_loss=0.09213, over 21854.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01201, ecapa_loss=0.0002739, whisper_loss=0.09723, over 3806300.40 frames. ], batch size: 88, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:09:26,532 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.45 vs. limit=22.5 2024-08-10 08:09:34,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=12.0 2024-08-10 08:09:50,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=455870.0, ans=0.025 2024-08-10 08:09:56,595 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 08:09:57,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.943e+01 3.340e+01 3.951e+01 7.714e+01, threshold=6.679e+01, percent-clipped=1.0 2024-08-10 08:10:28,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=456170.0, ans=0.125 2024-08-10 08:10:35,160 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 08:10:35,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=456270.0, ans=0.125 2024-08-10 08:10:36,802 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2150, loss[loss=0.1209, beats_loss=0.0138, ecapa_loss=0.0002447, whisper_loss=0.1047, over 22129.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.0121, ecapa_loss=0.0002733, whisper_loss=0.09691, over 3839839.48 frames. ], batch size: 88, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:10:41,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2024-08-10 08:10:41,825 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 08:10:45,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=456270.0, ans=0.0 2024-08-10 08:10:48,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=456270.0, ans=0.0 2024-08-10 08:10:51,451 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=22.5 2024-08-10 08:10:55,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=456370.0, ans=0.125 2024-08-10 08:11:10,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.94 vs. limit=22.5 2024-08-10 08:11:10,840 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 19 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 08:11:18,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=456470.0, ans=0.125 2024-08-10 08:11:29,834 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2024-08-10 08:11:30,660 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-10 08:11:30,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=456570.0, ans=0.125 2024-08-10 08:11:40,546 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 08:11:42,696 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=12.0 2024-08-10 08:11:51,049 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2200, loss[loss=0.1107, beats_loss=0.01032, ecapa_loss=0.00037, whisper_loss=0.0967, over 14468.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01213, ecapa_loss=0.0002735, whisper_loss=0.09736, over 3826949.82 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:12:02,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.01 vs. limit=22.5 2024-08-10 08:12:02,768 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 08:12:19,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=456870.0, ans=0.125 2024-08-10 08:12:25,685 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.98 vs. limit=10.0 2024-08-10 08:12:25,747 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-10 08:12:26,128 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 3.107e+01 3.618e+01 4.202e+01 6.900e+01, threshold=7.235e+01, percent-clipped=1.0 2024-08-10 08:12:50,685 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 08:12:51,393 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2024-08-10 08:13:02,000 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.87 vs. limit=10.0 2024-08-10 08:13:04,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457270.0, ans=0.1 2024-08-10 08:13:05,250 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2250, loss[loss=0.103, beats_loss=0.01194, ecapa_loss=0.0002256, whisper_loss=0.08882, over 16299.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01211, ecapa_loss=0.0002772, whisper_loss=0.09774, over 3827719.47 frames. ], batch size: 60, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:13:15,236 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.73 vs. limit=6.0 2024-08-10 08:13:30,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=457370.0, ans=0.1 2024-08-10 08:13:34,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=457470.0, ans=0.0 2024-08-10 08:13:36,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=457470.0, ans=0.125 2024-08-10 08:13:37,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=457470.0, ans=0.125 2024-08-10 08:13:45,710 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 08:14:06,121 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 08:14:21,609 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2300, loss[loss=0.103, beats_loss=0.01134, ecapa_loss=0.000291, whisper_loss=0.08879, over 18715.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01209, ecapa_loss=0.0002773, whisper_loss=0.09755, over 3838292.16 frames. ], batch size: 75, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:14:33,762 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 08:14:36,223 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 26 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 08:14:37,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=457870.0, ans=0.125 2024-08-10 08:14:37,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=457870.0, ans=0.125 2024-08-10 08:14:51,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=457970.0, ans=0.125 2024-08-10 08:14:53,705 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-10 08:14:56,707 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 3.052e+01 3.526e+01 3.987e+01 6.394e+01, threshold=7.053e+01, percent-clipped=0.0 2024-08-10 08:15:04,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=457970.0, ans=0.0 2024-08-10 08:15:18,707 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-10 08:15:37,222 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2350, loss[loss=0.08085, beats_loss=0.01214, ecapa_loss=0.0002547, whisper_loss=0.06616, over 16120.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01219, ecapa_loss=0.0002767, whisper_loss=0.09711, over 3845310.75 frames. ], batch size: 63, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:15:39,478 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.72 vs. limit=10.0 2024-08-10 08:15:41,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=458270.0, ans=0.0 2024-08-10 08:15:44,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=458270.0, ans=0.95 2024-08-10 08:15:46,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=458270.0, ans=0.0 2024-08-10 08:15:55,977 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-10 08:15:58,371 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2024-08-10 08:16:06,088 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 08:16:08,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=458470.0, ans=10.0 2024-08-10 08:16:09,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=458470.0, ans=0.1 2024-08-10 08:16:21,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=458470.0, ans=0.0 2024-08-10 08:16:32,485 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2024-08-10 08:16:56,446 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2400, loss[loss=0.09898, beats_loss=0.01237, ecapa_loss=0.0003213, whisper_loss=0.0834, over 20813.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01201, ecapa_loss=0.0002762, whisper_loss=0.09798, over 3852357.74 frames. ], batch size: 88, lr: 1.57e-02, grad_scale: 134217728.0 2024-08-10 08:17:07,723 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.04 vs. limit=6.0 2024-08-10 08:17:12,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=458870.0, ans=0.0 2024-08-10 08:17:20,051 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 08:17:27,218 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.146e-01 2024-08-10 08:17:29,644 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.868e+01 3.229e+01 3.686e+01 5.514e+01, threshold=6.458e+01, percent-clipped=0.0 2024-08-10 08:17:35,903 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 08:17:46,203 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 08:18:01,284 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 15 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 08:18:18,752 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2450, loss[loss=0.143, beats_loss=0.008808, ecapa_loss=0.0002487, whisper_loss=0.1317, over 17454.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01203, ecapa_loss=0.000275, whisper_loss=0.09813, over 3826932.16 frames. ], batch size: 62, lr: 1.57e-02, grad_scale: 134217728.0 2024-08-10 08:18:23,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=459270.0, ans=0.2 2024-08-10 08:18:25,714 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 16 from LS+wenet, 26 from Vox, 50 fro AS 2024-08-10 08:18:25,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=459270.0, ans=0.1 2024-08-10 08:19:03,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=459570.0, ans=0.05 2024-08-10 08:19:15,131 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2024-08-10 08:19:38,530 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 08:19:41,804 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2500, loss[loss=0.109, beats_loss=0.01176, ecapa_loss=0.0003047, whisper_loss=0.09424, over 21714.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01202, ecapa_loss=0.000278, whisper_loss=0.09805, over 3836830.26 frames. ], batch size: 89, lr: 1.57e-02, grad_scale: 134217728.0 2024-08-10 08:20:09,094 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.15 vs. limit=22.5 2024-08-10 08:20:16,039 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 14 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 08:20:19,688 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 08:20:31,191 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 2.999e+01 3.542e+01 3.925e+01 6.520e+01, threshold=7.085e+01, percent-clipped=1.0 2024-08-10 08:20:38,424 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.79 vs. limit=10.0 2024-08-10 08:20:40,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=459970.0, ans=0.125 2024-08-10 08:20:42,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=459970.0, ans=0.0 2024-08-10 08:20:50,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=460070.0, ans=0.0 2024-08-10 08:21:05,135 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 08:21:07,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=460170.0, ans=0.125 2024-08-10 08:21:15,500 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.660e+00 2024-08-10 08:21:17,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=460170.0, ans=0.125 2024-08-10 08:21:25,568 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2550, loss[loss=0.09665, beats_loss=0.01445, ecapa_loss=0.0002745, whisper_loss=0.07945, over 22315.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01205, ecapa_loss=0.0002784, whisper_loss=0.09754, over 3861968.93 frames. ], batch size: 94, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:21:32,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=460270.0, ans=0.0 2024-08-10 08:22:07,245 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 17 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-10 08:22:13,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=460470.0, ans=0.0 2024-08-10 08:23:02,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=460670.0, ans=0.125 2024-08-10 08:23:08,622 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2600, loss[loss=0.1395, beats_loss=0.007762, ecapa_loss=0.0002927, whisper_loss=0.1288, over 19663.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01195, ecapa_loss=0.0002786, whisper_loss=0.09827, over 3834695.12 frames. ], batch size: 75, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:23:32,517 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 08:24:01,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 3.079e+01 3.425e+01 3.855e+01 5.495e+01, threshold=6.850e+01, percent-clipped=0.0 2024-08-10 08:24:19,964 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=15.0 2024-08-10 08:24:30,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=461070.0, ans=0.1 2024-08-10 08:24:31,955 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 08:24:47,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=461170.0, ans=0.125 2024-08-10 08:24:59,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=461170.0, ans=0.125 2024-08-10 08:25:03,264 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2650, loss[loss=0.1268, beats_loss=0.01151, ecapa_loss=0.0002967, whisper_loss=0.1123, over 20729.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01205, ecapa_loss=0.0002777, whisper_loss=0.09731, over 3825048.10 frames. ], batch size: 82, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:25:04,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=461270.0, ans=0.125 2024-08-10 08:25:13,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=461270.0, ans=0.125 2024-08-10 08:25:20,743 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 14 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 08:25:42,391 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.14 vs. limit=22.5 2024-08-10 08:26:03,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=461470.0, ans=0.125 2024-08-10 08:26:07,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2024-08-10 08:26:18,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=461570.0, ans=0.2 2024-08-10 08:26:24,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=461570.0, ans=0.0 2024-08-10 08:26:54,763 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 08:26:57,694 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2700, loss[loss=0.1167, beats_loss=0.01297, ecapa_loss=0.0002156, whisper_loss=0.1015, over 18417.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01212, ecapa_loss=0.0002802, whisper_loss=0.09709, over 3851070.93 frames. ], batch size: 70, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:27:06,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=461770.0, ans=0.125 2024-08-10 08:27:22,041 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 08:27:22,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=461870.0, ans=0.125 2024-08-10 08:27:39,654 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 08:27:48,696 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.370e+01 3.222e+01 3.601e+01 4.234e+01 3.838e+02, threshold=7.201e+01, percent-clipped=7.0 2024-08-10 08:28:21,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=462170.0, ans=0.1 2024-08-10 08:28:30,391 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2750, loss[loss=0.1139, beats_loss=0.01098, ecapa_loss=0.0002789, whisper_loss=0.1001, over 17725.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01211, ecapa_loss=0.000279, whisper_loss=0.09712, over 3886457.09 frames. ], batch size: 71, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:28:33,907 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-10 08:29:22,446 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 08:29:22,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=462570.0, ans=0.1 2024-08-10 08:29:43,387 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 08:29:45,949 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2800, loss[loss=0.1247, beats_loss=0.01257, ecapa_loss=0.0003065, whisper_loss=0.109, over 21825.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01207, ecapa_loss=0.0002773, whisper_loss=0.09767, over 3883209.02 frames. ], batch size: 92, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:29:46,262 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-10 08:29:50,684 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-10 08:29:52,087 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 08:29:56,454 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 08:30:00,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=462870.0, ans=0.0 2024-08-10 08:30:08,714 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=22.5 2024-08-10 08:30:13,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=462870.0, ans=0.0 2024-08-10 08:30:15,909 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 08:30:19,946 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 3.197e+01 3.685e+01 4.218e+01 5.823e+01, threshold=7.371e+01, percent-clipped=0.0 2024-08-10 08:30:29,537 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 08:30:31,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=463070.0, ans=0.0 2024-08-10 08:30:33,666 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 15 from Vox, 53 fro AS 2024-08-10 08:30:41,400 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 08:30:41,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=463070.0, ans=0.125 2024-08-10 08:30:41,888 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-10 08:30:42,857 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 08:31:01,054 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2850, loss[loss=0.1348, beats_loss=0.01026, ecapa_loss=0.0002812, whisper_loss=0.1217, over 21322.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01207, ecapa_loss=0.0002769, whisper_loss=0.0982, over 3885179.30 frames. ], batch size: 86, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:31:04,248 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 08:31:10,342 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-10 08:31:13,829 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2024-08-10 08:31:19,424 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 13 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 08:31:20,360 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.22 vs. limit=5.0 2024-08-10 08:31:31,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=463470.0, ans=0.0 2024-08-10 08:32:24,078 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2900, loss[loss=0.1286, beats_loss=0.01093, ecapa_loss=0.0003207, whisper_loss=0.1144, over 21807.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.0121, ecapa_loss=0.000279, whisper_loss=0.09782, over 3885284.99 frames. ], batch size: 88, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:32:28,666 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-10 08:33:03,956 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 08:33:04,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=463970.0, ans=0.125 2024-08-10 08:33:05,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.474e+01 3.004e+01 3.404e+01 3.788e+01 1.422e+02, threshold=6.807e+01, percent-clipped=1.0 2024-08-10 08:33:18,724 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 08:33:20,871 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.14 vs. limit=22.5 2024-08-10 08:33:39,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=464170.0, ans=0.1 2024-08-10 08:33:49,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=464170.0, ans=0.125 2024-08-10 08:33:54,760 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 20 from LS+wenet, 32 from Vox, 42 fro AS 2024-08-10 08:33:55,631 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 2950, loss[loss=0.0831, beats_loss=0.0122, ecapa_loss=0.0002715, whisper_loss=0.06819, over 21850.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01213, ecapa_loss=0.0002792, whisper_loss=0.09724, over 3882518.08 frames. ], batch size: 94, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:34:00,495 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 08:34:05,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=464270.0, ans=0.125 2024-08-10 08:34:41,449 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 08:34:59,789 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 08:35:00,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=464570.0, ans=0.0 2024-08-10 08:35:16,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=464670.0, ans=0.0 2024-08-10 08:35:27,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3000, loss[loss=0.1135, beats_loss=0.01318, ecapa_loss=0.0002255, whisper_loss=0.09806, over 23643.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01219, ecapa_loss=0.0002778, whisper_loss=0.09736, over 3884465.54 frames. ], batch size: 91, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:35:27,857 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 08:36:05,694 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on ASR_libri: loss=0.2648, beats_loss=0, ecapa_loss=0.0008316, whisper_loss=0.2565, over 922467.00 frames. 2024-08-10 08:36:19,084 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9080, 3.3732, 3.5524, 3.5650], device='cuda:1') 2024-08-10 08:36:23,188 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on SV_voxceleb1: loss=0.007277, beats_loss=0, ecapa_loss=0.0007277, whisper_loss=0, over 939242.00 frames. 2024-08-10 08:38:19,683 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on AT_audioset: loss=0.0279, beats_loss=0.0279, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 08:38:19,687 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 08:38:33,094 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=15.0 2024-08-10 08:38:39,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=464870.0, ans=0.125 2024-08-10 08:38:54,738 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 39 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-10 08:38:56,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=464970.0, ans=0.125 2024-08-10 08:38:57,311 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 3.167e+01 3.615e+01 4.298e+01 8.066e+01, threshold=7.230e+01, percent-clipped=1.0 2024-08-10 08:38:58,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=464970.0, ans=0.125 2024-08-10 08:39:09,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=465070.0, ans=0.125 2024-08-10 08:39:33,919 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 08:39:40,930 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3050, loss[loss=0.1114, beats_loss=0.01084, ecapa_loss=0.0003413, whisper_loss=0.09718, over 18319.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01211, ecapa_loss=0.0002805, whisper_loss=0.09824, over 3868696.31 frames. ], batch size: 75, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:39:56,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=465370.0, ans=0.95 2024-08-10 08:40:15,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=465470.0, ans=0.125 2024-08-10 08:40:35,411 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-10 08:40:40,480 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 08:40:52,269 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 08:40:59,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=465670.0, ans=0.1 2024-08-10 08:41:03,696 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3100, loss[loss=0.1022, beats_loss=0.01349, ecapa_loss=0.0003667, whisper_loss=0.08502, over 21506.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01224, ecapa_loss=0.0002811, whisper_loss=0.09783, over 3875701.74 frames. ], batch size: 92, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:41:19,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=465770.0, ans=0.035 2024-08-10 08:41:30,227 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 08:41:33,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=465870.0, ans=0.125 2024-08-10 08:41:40,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=465970.0, ans=0.125 2024-08-10 08:41:43,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 3.398e+01 3.878e+01 4.582e+01 1.719e+02, threshold=7.756e+01, percent-clipped=2.0 2024-08-10 08:41:49,860 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=19.15 vs. limit=15.0 2024-08-10 08:41:59,491 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 08:42:33,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3150, loss[loss=0.1014, beats_loss=0.01212, ecapa_loss=0.0003045, whisper_loss=0.08621, over 20804.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01226, ecapa_loss=0.0002813, whisper_loss=0.09766, over 3878362.38 frames. ], batch size: 86, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:42:33,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=466270.0, ans=0.125 2024-08-10 08:42:53,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466370.0, ans=0.1 2024-08-10 08:42:59,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=466370.0, ans=0.125 2024-08-10 08:43:03,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=466370.0, ans=0.125 2024-08-10 08:43:33,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=466570.0, ans=0.0 2024-08-10 08:43:54,603 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.33 vs. limit=10.0 2024-08-10 08:43:57,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=466770.0, ans=0.125 2024-08-10 08:43:58,709 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3200, loss[loss=0.1113, beats_loss=0.01239, ecapa_loss=0.0002917, whisper_loss=0.09603, over 20330.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01224, ecapa_loss=0.0002813, whisper_loss=0.09769, over 3863142.98 frames. ], batch size: 84, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:44:11,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=466770.0, ans=0.125 2024-08-10 08:44:11,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=466770.0, ans=0.1 2024-08-10 08:44:13,590 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.644e-01 2024-08-10 08:44:35,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=466970.0, ans=0.125 2024-08-10 08:44:40,700 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 3.101e+01 3.705e+01 4.309e+01 1.166e+02, threshold=7.411e+01, percent-clipped=1.0 2024-08-10 08:44:45,690 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.838e+03 2024-08-10 08:44:47,497 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 08:44:53,775 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 08:44:54,635 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.80 vs. limit=12.0 2024-08-10 08:44:58,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=467070.0, ans=0.125 2024-08-10 08:45:11,520 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-10 08:45:22,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=467170.0, ans=0.1 2024-08-10 08:45:32,868 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3250, loss[loss=0.1081, beats_loss=0.01248, ecapa_loss=0.0002573, whisper_loss=0.09307, over 22316.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.0122, ecapa_loss=0.0002804, whisper_loss=0.09819, over 3893836.31 frames. ], batch size: 90, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:45:52,059 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.85 vs. limit=22.5 2024-08-10 08:45:58,485 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 08:46:10,128 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 08:46:14,015 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 27 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 08:46:25,156 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 08:46:28,904 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 08:46:55,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=467670.0, ans=15.0 2024-08-10 08:47:04,036 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3300, loss[loss=0.1037, beats_loss=0.01268, ecapa_loss=0.0002731, whisper_loss=0.08831, over 18048.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01214, ecapa_loss=0.00028, whisper_loss=0.09821, over 3872541.34 frames. ], batch size: 74, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:47:11,633 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-10 08:47:22,736 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 08:47:24,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=467870.0, ans=0.1 2024-08-10 08:47:46,896 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.467e+01 3.041e+01 3.344e+01 3.812e+01 6.169e+01, threshold=6.688e+01, percent-clipped=0.0 2024-08-10 08:47:57,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=468070.0, ans=0.2 2024-08-10 08:48:12,922 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 08:48:19,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=468170.0, ans=0.1 2024-08-10 08:48:34,242 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3350, loss[loss=0.1268, beats_loss=0.009963, ecapa_loss=0.000238, whisper_loss=0.1144, over 15838.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01207, ecapa_loss=0.0002802, whisper_loss=0.09875, over 3892177.62 frames. ], batch size: 57, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:48:39,611 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 08:48:43,935 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 08:48:48,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=468370.0, ans=0.2 2024-08-10 08:48:50,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=468370.0, ans=0.125 2024-08-10 08:49:42,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=468670.0, ans=0.2 2024-08-10 08:49:58,918 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3400, loss[loss=0.1065, beats_loss=0.01481, ecapa_loss=0.0003124, whisper_loss=0.08856, over 21663.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01215, ecapa_loss=0.0002774, whisper_loss=0.0981, over 3905127.73 frames. ], batch size: 93, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:49:59,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=468770.0, ans=0.0 2024-08-10 08:50:26,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=468870.0, ans=0.0 2024-08-10 08:50:34,202 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+01 3.156e+01 3.587e+01 4.181e+01 1.855e+02, threshold=7.174e+01, percent-clipped=2.0 2024-08-10 08:50:39,869 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2024-08-10 08:50:51,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=469070.0, ans=0.2 2024-08-10 08:51:02,444 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.027e+01 2024-08-10 08:51:04,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=469170.0, ans=0.1 2024-08-10 08:51:15,948 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3450, loss[loss=0.1152, beats_loss=0.01353, ecapa_loss=0.0002587, whisper_loss=0.09904, over 23686.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01218, ecapa_loss=0.0002798, whisper_loss=0.0972, over 3920903.01 frames. ], batch size: 93, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:51:19,219 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 08:51:28,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=469270.0, ans=0.125 2024-08-10 08:51:48,069 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-10 08:51:51,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=469470.0, ans=0.0 2024-08-10 08:52:08,533 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 08:52:12,826 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 17 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 08:52:15,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=469670.0, ans=0.07 2024-08-10 08:52:24,738 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 08:52:29,768 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3500, loss[loss=0.121, beats_loss=0.01079, ecapa_loss=0.0002576, whisper_loss=0.1077, over 21190.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01226, ecapa_loss=0.0002793, whisper_loss=0.09639, over 3896363.69 frames. ], batch size: 83, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:52:30,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=469770.0, ans=10.0 2024-08-10 08:52:38,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=469770.0, ans=0.125 2024-08-10 08:52:41,852 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 08:52:56,891 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 08:53:04,079 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 3.037e+01 3.390e+01 3.981e+01 6.541e+01, threshold=6.780e+01, percent-clipped=0.0 2024-08-10 08:53:16,944 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2024-08-10 08:53:42,502 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2024-08-10 08:53:44,568 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3550, loss[loss=0.1279, beats_loss=0.01121, ecapa_loss=0.0002799, whisper_loss=0.1139, over 20788.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01229, ecapa_loss=0.0002804, whisper_loss=0.0959, over 3881622.39 frames. ], batch size: 80, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:53:57,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=470370.0, ans=0.2 2024-08-10 08:54:00,230 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 08:54:08,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=470370.0, ans=0.0 2024-08-10 08:54:09,414 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 08:54:15,713 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2024-08-10 08:54:23,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=470470.0, ans=0.07 2024-08-10 08:54:25,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=470470.0, ans=0.0 2024-08-10 08:54:29,288 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 08:54:43,026 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 08:54:54,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=470670.0, ans=0.125 2024-08-10 08:54:57,305 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3600, loss[loss=0.105, beats_loss=0.01162, ecapa_loss=0.0003002, whisper_loss=0.09036, over 17480.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01224, ecapa_loss=0.0002785, whisper_loss=0.09587, over 3859212.56 frames. ], batch size: 72, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:55:01,592 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 08:55:08,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=470770.0, ans=0.1 2024-08-10 08:55:22,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=470870.0, ans=0.1 2024-08-10 08:55:29,114 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.993e+01 3.332e+01 3.946e+01 5.463e+01, threshold=6.665e+01, percent-clipped=0.0 2024-08-10 08:55:41,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=471070.0, ans=0.125 2024-08-10 08:55:41,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=471070.0, ans=0.0 2024-08-10 08:55:47,192 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 08:56:10,605 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3650, loss[loss=0.1262, beats_loss=0.01068, ecapa_loss=0.000292, whisper_loss=0.1126, over 22597.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01226, ecapa_loss=0.0002779, whisper_loss=0.09589, over 3856997.41 frames. ], batch size: 88, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:56:38,867 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=15.0 2024-08-10 08:56:41,419 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-10 08:56:44,462 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2024-08-10 08:56:50,370 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2024-08-10 08:56:58,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=471570.0, ans=0.125 2024-08-10 08:57:01,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=471570.0, ans=0.125 2024-08-10 08:57:05,460 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 08:57:19,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=471770.0, ans=0.2 2024-08-10 08:57:19,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=471770.0, ans=0.2 2024-08-10 08:57:20,478 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3700, loss[loss=0.1064, beats_loss=0.01147, ecapa_loss=0.0002984, whisper_loss=0.09193, over 22093.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01232, ecapa_loss=0.0002775, whisper_loss=0.09499, over 3833969.76 frames. ], batch size: 89, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:57:39,576 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-10 08:57:49,412 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 08:57:50,693 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 08:57:51,902 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 3.070e+01 3.607e+01 4.290e+01 1.526e+02, threshold=7.214e+01, percent-clipped=4.0 2024-08-10 08:58:05,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=472070.0, ans=0.0 2024-08-10 08:58:17,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=472170.0, ans=0.025 2024-08-10 08:58:17,783 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.33 vs. limit=12.0 2024-08-10 08:58:22,168 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 08:58:27,360 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3750, loss[loss=0.1195, beats_loss=0.01451, ecapa_loss=0.0003074, whisper_loss=0.1019, over 21927.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.0124, ecapa_loss=0.0002797, whisper_loss=0.0953, over 3839886.94 frames. ], batch size: 90, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:58:41,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=472370.0, ans=0.0 2024-08-10 08:58:55,422 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 30 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 08:59:01,130 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 08:59:03,032 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2024-08-10 08:59:14,475 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 08:59:35,325 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 08:59:40,984 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3800, loss[loss=0.1072, beats_loss=0.01373, ecapa_loss=0.0002341, whisper_loss=0.09113, over 22108.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01234, ecapa_loss=0.0002776, whisper_loss=0.0962, over 3862393.12 frames. ], batch size: 89, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:00:01,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=472870.0, ans=0.0 2024-08-10 09:00:13,815 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.142e+01 3.387e+01 4.333e+01 6.143e+01, threshold=6.774e+01, percent-clipped=0.0 2024-08-10 09:00:23,313 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 09:00:36,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=473170.0, ans=0.1 2024-08-10 09:00:38,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=473170.0, ans=0.125 2024-08-10 09:00:39,943 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.91 vs. limit=22.5 2024-08-10 09:00:49,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=473170.0, ans=0.015 2024-08-10 09:00:52,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3850, loss[loss=0.08912, beats_loss=0.0134, ecapa_loss=0.0002926, whisper_loss=0.07279, over 13064.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01235, ecapa_loss=0.0002767, whisper_loss=0.09591, over 3841712.37 frames. ], batch size: 54, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:00:56,954 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-10 09:00:58,078 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 09:01:06,680 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2024-08-10 09:01:12,091 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 14 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 09:01:27,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=473470.0, ans=0.125 2024-08-10 09:02:04,730 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3900, loss[loss=0.1177, beats_loss=0.01013, ecapa_loss=0.0003426, whisper_loss=0.1041, over 21239.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01224, ecapa_loss=0.0002793, whisper_loss=0.09738, over 3861813.45 frames. ], batch size: 90, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:02:09,503 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 09:02:16,818 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 12 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 09:02:19,941 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 13 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 09:02:28,191 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 09:02:34,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2024-08-10 09:02:38,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.399e+01 3.153e+01 3.691e+01 4.376e+01 6.503e+01, threshold=7.382e+01, percent-clipped=0.0 2024-08-10 09:02:42,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=473970.0, ans=0.125 2024-08-10 09:02:44,886 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 09:02:51,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.40 vs. limit=12.0 2024-08-10 09:02:53,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=474070.0, ans=0.1 2024-08-10 09:02:58,042 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 09:03:11,331 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.02 vs. limit=12.0 2024-08-10 09:03:14,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=474170.0, ans=0.0 2024-08-10 09:03:17,553 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 3950, loss[loss=0.1338, beats_loss=0.01083, ecapa_loss=0.0003438, whisper_loss=0.1195, over 21556.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01222, ecapa_loss=0.0002809, whisper_loss=0.09774, over 3881006.81 frames. ], batch size: 89, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:03:23,913 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 09:03:38,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=474370.0, ans=0.05 2024-08-10 09:03:45,779 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2024-08-10 09:03:57,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.08 vs. limit=22.5 2024-08-10 09:04:05,734 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.41 vs. limit=22.5 2024-08-10 09:04:15,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=474670.0, ans=0.125 2024-08-10 09:04:15,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=474670.0, ans=0.125 2024-08-10 09:04:28,844 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4000, loss[loss=0.1151, beats_loss=0.01183, ecapa_loss=0.000327, whisper_loss=0.09995, over 21791.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01212, ecapa_loss=0.0002828, whisper_loss=0.09833, over 3908367.26 frames. ], batch size: 88, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:04:32,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=474770.0, ans=0.125 2024-08-10 09:04:43,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=474870.0, ans=0.125 2024-08-10 09:04:43,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2024-08-10 09:04:51,161 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2024-08-10 09:04:51,845 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-10 09:04:55,441 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 34 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 09:04:56,939 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 09:05:02,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 3.204e+01 3.613e+01 4.111e+01 7.755e+01, threshold=7.226e+01, percent-clipped=1.0 2024-08-10 09:05:07,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=474970.0, ans=0.2 2024-08-10 09:05:07,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=474970.0, ans=0.125 2024-08-10 09:05:12,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475070.0, ans=0.1 2024-08-10 09:05:34,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=475170.0, ans=0.0 2024-08-10 09:05:43,778 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4050, loss[loss=0.1028, beats_loss=0.01253, ecapa_loss=0.0002888, whisper_loss=0.08742, over 15692.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01211, ecapa_loss=0.0002809, whisper_loss=0.0981, over 3893357.44 frames. ], batch size: 63, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:05:53,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=475270.0, ans=0.125 2024-08-10 09:06:02,420 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.518e-03 2024-08-10 09:06:17,643 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 09:06:33,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=475570.0, ans=0.125 2024-08-10 09:06:34,079 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 09:06:42,603 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 09:06:48,561 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 09:06:49,911 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 09:06:51,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=475670.0, ans=0.125 2024-08-10 09:06:56,984 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.27 vs. limit=15.0 2024-08-10 09:06:57,611 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4100, loss[loss=0.1121, beats_loss=0.01092, ecapa_loss=0.0003572, whisper_loss=0.09758, over 17282.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01207, ecapa_loss=0.0002822, whisper_loss=0.09792, over 3839939.75 frames. ], batch size: 70, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:07:07,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=475770.0, ans=0.1 2024-08-10 09:07:14,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=475870.0, ans=0.04949747468305833 2024-08-10 09:07:18,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=475870.0, ans=0.025 2024-08-10 09:07:23,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475870.0, ans=0.1 2024-08-10 09:07:34,206 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.441e+01 3.046e+01 3.447e+01 3.852e+01 5.765e+01, threshold=6.895e+01, percent-clipped=0.0 2024-08-10 09:07:45,506 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 24 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-10 09:07:47,340 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 09:07:53,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=476070.0, ans=0.125 2024-08-10 09:08:04,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=476170.0, ans=0.125 2024-08-10 09:08:15,270 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4150, loss[loss=0.1411, beats_loss=0.007965, ecapa_loss=0.000288, whisper_loss=0.1303, over 17384.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01208, ecapa_loss=0.0002814, whisper_loss=0.09776, over 3856571.63 frames. ], batch size: 67, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:08:17,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=476270.0, ans=0.125 2024-08-10 09:08:19,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2024-08-10 09:08:23,723 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2024-08-10 09:08:27,773 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.532e+00 2024-08-10 09:08:37,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=476370.0, ans=0.2 2024-08-10 09:08:42,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=476370.0, ans=0.2 2024-08-10 09:08:45,106 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-10 09:09:07,698 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-10 09:09:23,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=476670.0, ans=0.125 2024-08-10 09:09:38,107 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4200, loss[loss=0.09921, beats_loss=0.01367, ecapa_loss=0.0002866, whisper_loss=0.08267, over 18870.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01214, ecapa_loss=0.0002818, whisper_loss=0.09738, over 3862416.86 frames. ], batch size: 78, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:10:12,936 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.087e+01 3.164e+01 3.633e+01 4.360e+01 6.348e+01, threshold=7.265e+01, percent-clipped=0.0 2024-08-10 09:10:16,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=476970.0, ans=0.0 2024-08-10 09:10:21,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=476970.0, ans=0.2 2024-08-10 09:10:35,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=477070.0, ans=0.125 2024-08-10 09:10:38,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=477170.0, ans=0.125 2024-08-10 09:10:46,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=477170.0, ans=0.0 2024-08-10 09:10:46,603 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2024-08-10 09:10:52,677 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 09:10:56,680 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4250, loss[loss=0.09807, beats_loss=0.01294, ecapa_loss=0.0002871, whisper_loss=0.08225, over 17736.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01206, ecapa_loss=0.0002812, whisper_loss=0.0976, over 3869971.67 frames. ], batch size: 72, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:11:08,259 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-08-10 09:11:10,687 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 09:11:12,797 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 09:11:30,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=477470.0, ans=0.125 2024-08-10 09:11:31,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=477470.0, ans=0.2 2024-08-10 09:11:39,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=477470.0, ans=0.125 2024-08-10 09:12:01,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=477670.0, ans=0.0 2024-08-10 09:12:16,167 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4300, loss[loss=0.122, beats_loss=0.01296, ecapa_loss=0.0002549, whisper_loss=0.1065, over 20729.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01213, ecapa_loss=0.0002788, whisper_loss=0.09708, over 3869852.90 frames. ], batch size: 82, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:12:17,241 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.23 vs. limit=15.0 2024-08-10 09:12:23,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=477770.0, ans=0.125 2024-08-10 09:12:25,459 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 31 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 09:12:29,941 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 09:12:54,981 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.896e+01 3.194e+01 3.711e+01 5.609e+01, threshold=6.388e+01, percent-clipped=0.0 2024-08-10 09:13:04,012 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 09:13:05,722 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-08-10 09:13:10,988 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 09:13:24,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2024-08-10 09:13:29,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=478170.0, ans=0.2 2024-08-10 09:13:31,572 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2024-08-10 09:13:34,368 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4350, loss[loss=0.08101, beats_loss=0.0161, ecapa_loss=0.0002217, whisper_loss=0.06269, over 15164.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01209, ecapa_loss=0.0002803, whisper_loss=0.09737, over 3875521.27 frames. ], batch size: 61, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:13:44,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=478270.0, ans=0.0 2024-08-10 09:13:47,372 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 09:13:59,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=478370.0, ans=0.125 2024-08-10 09:14:01,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=478370.0, ans=0.0 2024-08-10 09:14:24,607 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 09:14:26,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=478570.0, ans=0.1 2024-08-10 09:14:28,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=478570.0, ans=0.125 2024-08-10 09:14:39,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=478670.0, ans=0.125 2024-08-10 09:14:40,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=478670.0, ans=0.2 2024-08-10 09:14:42,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=478670.0, ans=0.125 2024-08-10 09:14:47,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=478670.0, ans=0.2 2024-08-10 09:14:51,317 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4400, loss[loss=0.1161, beats_loss=0.01179, ecapa_loss=0.0003186, whisper_loss=0.1011, over 14440.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01213, ecapa_loss=0.0002785, whisper_loss=0.09734, over 3863763.30 frames. ], batch size: 59, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:14:54,297 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-10 09:14:57,210 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=22.5 2024-08-10 09:14:59,379 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 09:15:15,289 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 21 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-10 09:15:23,705 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2024-08-10 09:15:24,454 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-10 09:15:25,503 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.527e+01 3.041e+01 3.447e+01 3.976e+01 9.860e+01, threshold=6.894e+01, percent-clipped=1.0 2024-08-10 09:15:28,133 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 09:15:54,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=479170.0, ans=0.1 2024-08-10 09:15:56,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=479170.0, ans=0.125 2024-08-10 09:15:58,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=479170.0, ans=0.125 2024-08-10 09:16:04,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4450, loss[loss=0.1138, beats_loss=0.0131, ecapa_loss=0.0002879, whisper_loss=0.0978, over 16524.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01214, ecapa_loss=0.0002783, whisper_loss=0.09685, over 3870535.64 frames. ], batch size: 67, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:16:13,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=479270.0, ans=0.025 2024-08-10 09:16:17,297 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 09:16:38,252 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.43 vs. limit=22.5 2024-08-10 09:16:53,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=479570.0, ans=0.0 2024-08-10 09:17:04,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=479670.0, ans=0.0 2024-08-10 09:17:11,692 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.47 vs. limit=10.0 2024-08-10 09:17:11,955 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4500, loss[loss=0.1084, beats_loss=0.0131, ecapa_loss=0.0002496, whisper_loss=0.09281, over 23101.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01217, ecapa_loss=0.0002777, whisper_loss=0.09632, over 3880863.90 frames. ], batch size: 93, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:17:25,226 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-10 09:17:31,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=479870.0, ans=0.0 2024-08-10 09:17:38,970 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 09:17:44,010 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.468e+01 3.224e+01 3.675e+01 4.252e+01 6.669e+01, threshold=7.350e+01, percent-clipped=1.0 2024-08-10 09:17:45,658 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-10 09:18:19,030 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 09:18:20,210 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4550, loss[loss=0.1155, beats_loss=0.01269, ecapa_loss=0.0002754, whisper_loss=0.1, over 22012.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01223, ecapa_loss=0.0002781, whisper_loss=0.09575, over 3886325.18 frames. ], batch size: 90, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:18:20,815 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-10 09:18:28,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=480270.0, ans=0.1 2024-08-10 09:18:34,844 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2024-08-10 09:18:42,124 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 09:18:51,911 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 16 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 09:19:04,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=480570.0, ans=0.125 2024-08-10 09:19:24,192 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2024-08-10 09:19:27,573 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4600, loss[loss=0.09485, beats_loss=0.01184, ecapa_loss=0.0002234, whisper_loss=0.08078, over 19389.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01225, ecapa_loss=0.0002787, whisper_loss=0.0959, over 3892010.08 frames. ], batch size: 75, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:19:38,674 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.34 vs. limit=8.0 2024-08-10 09:19:44,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.50 vs. limit=22.5 2024-08-10 09:19:48,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=480870.0, ans=0.05 2024-08-10 09:19:51,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=480870.0, ans=0.0 2024-08-10 09:19:56,960 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 3.144e+01 3.622e+01 4.296e+01 6.398e+01, threshold=7.244e+01, percent-clipped=0.0 2024-08-10 09:20:05,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=481070.0, ans=0.125 2024-08-10 09:20:08,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=481070.0, ans=0.125 2024-08-10 09:20:13,410 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 09:20:15,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=481070.0, ans=0.0 2024-08-10 09:20:20,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=481170.0, ans=0.125 2024-08-10 09:20:26,630 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 09:20:32,917 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4650, loss[loss=0.105, beats_loss=0.01143, ecapa_loss=0.0002868, whisper_loss=0.09074, over 22217.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.0123, ecapa_loss=0.0002773, whisper_loss=0.09507, over 3837311.23 frames. ], batch size: 89, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:20:33,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=481270.0, ans=0.125 2024-08-10 09:20:35,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=481270.0, ans=0.0 2024-08-10 09:20:43,124 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 23 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-10 09:20:44,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481370.0, ans=0.1 2024-08-10 09:20:56,168 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 29 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-10 09:20:58,754 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 17 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 09:21:04,810 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=12.0 2024-08-10 09:21:22,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=481570.0, ans=0.125 2024-08-10 09:21:37,240 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4700, loss[loss=0.1225, beats_loss=0.01316, ecapa_loss=0.0002217, whisper_loss=0.1071, over 18030.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01232, ecapa_loss=0.0002757, whisper_loss=0.09597, over 3873455.44 frames. ], batch size: 67, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:21:59,154 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 09:22:01,747 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 09:22:07,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.494e+01 3.095e+01 3.461e+01 3.864e+01 6.358e+01, threshold=6.922e+01, percent-clipped=0.0 2024-08-10 09:22:15,549 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2024-08-10 09:22:32,255 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 30 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 09:22:37,274 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-10 09:22:40,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=482170.0, ans=0.125 2024-08-10 09:22:42,259 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4750, loss[loss=0.09414, beats_loss=0.01114, ecapa_loss=0.0002992, whisper_loss=0.08001, over 16301.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01232, ecapa_loss=0.0002735, whisper_loss=0.09545, over 3876202.95 frames. ], batch size: 66, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:22:52,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=482270.0, ans=0.125 2024-08-10 09:23:00,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=482370.0, ans=0.0 2024-08-10 09:23:15,415 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 09:23:22,124 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-10 09:23:25,977 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-10 09:23:27,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=482570.0, ans=0.1 2024-08-10 09:23:28,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=482570.0, ans=0.125 2024-08-10 09:23:44,999 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 09:23:45,994 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4800, loss[loss=0.1057, beats_loss=0.01364, ecapa_loss=0.0003265, whisper_loss=0.08877, over 21396.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01242, ecapa_loss=0.0002746, whisper_loss=0.09479, over 3851550.80 frames. ], batch size: 90, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:23:52,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=482770.0, ans=0.1 2024-08-10 09:23:52,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482770.0, ans=0.1 2024-08-10 09:23:53,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=482770.0, ans=0.125 2024-08-10 09:24:09,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=482870.0, ans=0.05 2024-08-10 09:24:14,806 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.601e+01 3.085e+01 3.419e+01 4.209e+01 9.011e+01, threshold=6.838e+01, percent-clipped=2.0 2024-08-10 09:24:22,915 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-10 09:24:25,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=483070.0, ans=0.125 2024-08-10 09:24:35,084 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 09:24:39,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=483170.0, ans=0.0 2024-08-10 09:24:40,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=483170.0, ans=0.0 2024-08-10 09:24:49,340 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4850, loss[loss=0.1279, beats_loss=0.0116, ecapa_loss=0.0002763, whisper_loss=0.1135, over 22673.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01229, ecapa_loss=0.0002759, whisper_loss=0.09611, over 3881080.58 frames. ], batch size: 92, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:24:49,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=483270.0, ans=0.05 2024-08-10 09:25:05,022 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=12.0 2024-08-10 09:25:30,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=483570.0, ans=0.05 2024-08-10 09:25:30,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=483570.0, ans=0.0 2024-08-10 09:25:44,364 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 09:25:53,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=483670.0, ans=0.125 2024-08-10 09:25:56,940 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-08-10 09:25:58,071 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 12 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 09:26:02,249 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4900, loss[loss=0.08543, beats_loss=0.01533, ecapa_loss=0.0002081, whisper_loss=0.06802, over 14316.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01235, ecapa_loss=0.0002755, whisper_loss=0.09591, over 3894193.27 frames. ], batch size: 54, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:26:07,180 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=15.0 2024-08-10 09:26:17,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2024-08-10 09:26:29,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=483870.0, ans=0.125 2024-08-10 09:26:31,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=483870.0, ans=0.125 2024-08-10 09:26:35,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=483970.0, ans=0.0 2024-08-10 09:26:35,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=483970.0, ans=0.2 2024-08-10 09:26:39,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 3.191e+01 3.639e+01 4.118e+01 6.849e+01, threshold=7.278e+01, percent-clipped=1.0 2024-08-10 09:26:53,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=484070.0, ans=0.125 2024-08-10 09:26:53,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=484070.0, ans=0.1 2024-08-10 09:27:03,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=12.0 2024-08-10 09:27:29,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 4950, loss[loss=0.1066, beats_loss=0.01152, ecapa_loss=0.0003059, whisper_loss=0.09206, over 19648.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01228, ecapa_loss=0.0002762, whisper_loss=0.09605, over 3853602.09 frames. ], batch size: 82, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:27:33,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=484270.0, ans=0.125 2024-08-10 09:28:05,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.73 vs. limit=10.0 2024-08-10 09:28:09,610 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 09:28:12,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=484470.0, ans=0.125 2024-08-10 09:28:37,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=484570.0, ans=0.125 2024-08-10 09:28:44,809 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.36 vs. limit=15.0 2024-08-10 09:28:50,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=484670.0, ans=0.125 2024-08-10 09:28:52,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=484670.0, ans=0.125 2024-08-10 09:29:04,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2024-08-10 09:29:06,263 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5000, loss[loss=0.133, beats_loss=0.00757, ecapa_loss=0.0002949, whisper_loss=0.1225, over 17030.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01228, ecapa_loss=0.0002745, whisper_loss=0.09547, over 3856414.01 frames. ], batch size: 64, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:29:13,859 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-10 09:29:35,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=484870.0, ans=0.0 2024-08-10 09:29:46,371 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.06 vs. limit=8.0 2024-08-10 09:29:52,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.456e+01 3.034e+01 3.424e+01 4.085e+01 5.403e+01, threshold=6.848e+01, percent-clipped=0.0 2024-08-10 09:30:33,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=485170.0, ans=0.125 2024-08-10 09:30:37,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=485170.0, ans=0.125 2024-08-10 09:30:44,504 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5050, loss[loss=0.09847, beats_loss=0.01284, ecapa_loss=0.0002796, whisper_loss=0.08283, over 17193.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01238, ecapa_loss=0.0002755, whisper_loss=0.09587, over 3889602.45 frames. ], batch size: 70, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:30:58,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=485270.0, ans=0.0 2024-08-10 09:31:10,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=485370.0, ans=0.2 2024-08-10 09:31:10,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=485370.0, ans=0.04949747468305833 2024-08-10 09:31:21,335 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-10 09:31:29,821 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2024-08-10 09:31:38,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=485470.0, ans=0.125 2024-08-10 09:31:41,642 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 09:31:44,026 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-08-10 09:32:13,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=485670.0, ans=0.125 2024-08-10 09:32:16,071 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5100, loss[loss=0.1121, beats_loss=0.01056, ecapa_loss=0.0002656, whisper_loss=0.09883, over 23187.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01237, ecapa_loss=0.0002738, whisper_loss=0.0963, over 3895568.63 frames. ], batch size: 92, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:32:36,772 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 09:32:45,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.495e+01 3.245e+01 3.767e+01 4.403e+01 1.091e+02, threshold=7.533e+01, percent-clipped=4.0 2024-08-10 09:32:49,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=485970.0, ans=0.125 2024-08-10 09:33:00,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=486070.0, ans=0.125 2024-08-10 09:33:05,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=486070.0, ans=0.0 2024-08-10 09:33:14,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=486170.0, ans=0.125 2024-08-10 09:33:18,499 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.83 vs. limit=22.5 2024-08-10 09:33:19,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=486270.0, ans=0.0 2024-08-10 09:33:20,287 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5150, loss[loss=0.1013, beats_loss=0.01128, ecapa_loss=0.0002289, whisper_loss=0.08773, over 15195.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01229, ecapa_loss=0.0002718, whisper_loss=0.09696, over 3880203.51 frames. ], batch size: 59, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:33:29,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=486270.0, ans=0.125 2024-08-10 09:33:33,109 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.21 vs. limit=22.5 2024-08-10 09:33:34,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=486370.0, ans=0.125 2024-08-10 09:33:44,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=486470.0, ans=0.125 2024-08-10 09:33:49,493 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.65 vs. limit=10.0 2024-08-10 09:34:01,666 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-10 09:34:03,527 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=12.0 2024-08-10 09:34:04,219 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 09:34:20,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=486670.0, ans=0.125 2024-08-10 09:34:22,744 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5200, loss[loss=0.09955, beats_loss=0.01228, ecapa_loss=0.0003378, whisper_loss=0.08389, over 16589.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01224, ecapa_loss=0.000272, whisper_loss=0.09657, over 3853645.93 frames. ], batch size: 67, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:34:39,784 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-10 09:34:43,254 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 09:34:50,930 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 09:34:51,987 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 3.041e+01 3.408e+01 4.043e+01 9.843e+01, threshold=6.816e+01, percent-clipped=1.0 2024-08-10 09:35:07,167 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 09:35:20,092 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 09:35:25,916 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5250, loss[loss=0.1139, beats_loss=0.01408, ecapa_loss=0.0002212, whisper_loss=0.09764, over 15703.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01226, ecapa_loss=0.0002715, whisper_loss=0.09629, over 3863261.61 frames. ], batch size: 62, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:35:27,579 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.08 vs. limit=22.5 2024-08-10 09:35:42,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487370.0, ans=0.1 2024-08-10 09:36:00,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=487470.0, ans=0.125 2024-08-10 09:36:02,280 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.79 vs. limit=6.0 2024-08-10 09:36:07,424 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.76 vs. limit=5.0 2024-08-10 09:36:13,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=487570.0, ans=0.125 2024-08-10 09:36:16,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=487670.0, ans=0.0 2024-08-10 09:36:21,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=487670.0, ans=0.125 2024-08-10 09:36:29,509 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5300, loss[loss=0.08893, beats_loss=0.01206, ecapa_loss=0.0002999, whisper_loss=0.07388, over 14161.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01219, ecapa_loss=0.0002711, whisper_loss=0.09681, over 3888121.95 frames. ], batch size: 59, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:36:30,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=487770.0, ans=0.125 2024-08-10 09:36:51,074 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 09:36:54,335 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2024-08-10 09:36:58,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 3.142e+01 3.530e+01 4.338e+01 6.802e+01, threshold=7.061e+01, percent-clipped=0.0 2024-08-10 09:37:19,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=488170.0, ans=0.0 2024-08-10 09:37:33,259 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5350, loss[loss=0.1024, beats_loss=0.01118, ecapa_loss=0.0002642, whisper_loss=0.08861, over 17693.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01212, ecapa_loss=0.0002736, whisper_loss=0.09635, over 3875987.50 frames. ], batch size: 69, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:37:40,894 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 09:37:52,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=488370.0, ans=0.0 2024-08-10 09:37:52,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=488370.0, ans=0.125 2024-08-10 09:37:57,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=488470.0, ans=0.125 2024-08-10 09:38:00,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=488470.0, ans=0.1 2024-08-10 09:38:10,760 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.12 vs. limit=10.0 2024-08-10 09:38:19,465 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 09:38:33,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=488670.0, ans=0.1 2024-08-10 09:38:36,678 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5400, loss[loss=0.108, beats_loss=0.0118, ecapa_loss=0.0002526, whisper_loss=0.09369, over 23605.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01209, ecapa_loss=0.0002734, whisper_loss=0.0958, over 3884550.53 frames. ], batch size: 93, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:38:44,192 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 09:38:46,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=488770.0, ans=22.5 2024-08-10 09:38:46,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=488770.0, ans=0.0 2024-08-10 09:39:05,537 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.881e+01 3.134e+01 3.602e+01 5.252e+01, threshold=6.268e+01, percent-clipped=0.0 2024-08-10 09:39:20,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=489070.0, ans=0.0 2024-08-10 09:39:39,754 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5450, loss[loss=0.1085, beats_loss=0.01386, ecapa_loss=0.0002485, whisper_loss=0.09213, over 21289.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01213, ecapa_loss=0.0002727, whisper_loss=0.09646, over 3877667.04 frames. ], batch size: 85, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:39:41,099 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 09:39:42,753 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-10 09:39:50,292 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-10 09:39:50,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=489270.0, ans=0.125 2024-08-10 09:39:51,531 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-10 09:39:55,803 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-10 09:40:01,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=489370.0, ans=0.2 2024-08-10 09:40:04,071 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-10 09:40:13,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=489470.0, ans=0.125 2024-08-10 09:40:14,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=489470.0, ans=0.05 2024-08-10 09:40:15,812 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=12.0 2024-08-10 09:40:30,896 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 09:40:43,665 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5500, loss[loss=0.09591, beats_loss=0.014, ecapa_loss=0.0002691, whisper_loss=0.07922, over 21928.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01213, ecapa_loss=0.0002744, whisper_loss=0.09644, over 3868557.82 frames. ], batch size: 92, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:40:50,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=489770.0, ans=0.1 2024-08-10 09:40:59,163 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2024-08-10 09:41:00,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=489870.0, ans=0.125 2024-08-10 09:41:01,357 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 09:41:12,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 3.173e+01 3.591e+01 4.081e+01 1.350e+02, threshold=7.183e+01, percent-clipped=1.0 2024-08-10 09:41:12,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=489970.0, ans=0.125 2024-08-10 09:41:30,001 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-10 09:41:35,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=490170.0, ans=0.125 2024-08-10 09:41:45,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=490170.0, ans=0.2 2024-08-10 09:41:47,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=490270.0, ans=0.1 2024-08-10 09:41:47,823 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5550, loss[loss=0.1164, beats_loss=0.01082, ecapa_loss=0.0002856, whisper_loss=0.1027, over 16130.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.0122, ecapa_loss=0.0002743, whisper_loss=0.09667, over 3882018.33 frames. ], batch size: 64, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:42:03,661 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 09:42:10,918 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 09:42:17,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=490470.0, ans=0.07 2024-08-10 09:42:17,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=490470.0, ans=0.125 2024-08-10 09:42:36,461 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 09:42:48,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=490670.0, ans=0.1 2024-08-10 09:42:51,082 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5600, loss[loss=0.1221, beats_loss=0.009744, ecapa_loss=0.0003997, whisper_loss=0.1084, over 15740.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01211, ecapa_loss=0.000275, whisper_loss=0.09674, over 3856877.69 frames. ], batch size: 68, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:42:53,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=490770.0, ans=0.0 2024-08-10 09:43:06,585 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 09:43:06,931 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.34 vs. limit=10.0 2024-08-10 09:43:19,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=490970.0, ans=0.0 2024-08-10 09:43:20,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 3.015e+01 3.404e+01 4.297e+01 6.726e+01, threshold=6.809e+01, percent-clipped=0.0 2024-08-10 09:43:22,918 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 09:43:42,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=8.0 2024-08-10 09:43:45,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=491170.0, ans=0.125 2024-08-10 09:43:47,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=491170.0, ans=0.125 2024-08-10 09:43:48,021 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 09:43:54,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=491270.0, ans=0.1 2024-08-10 09:43:55,730 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5650, loss[loss=0.09, beats_loss=0.01411, ecapa_loss=0.0002188, whisper_loss=0.0737, over 16926.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01215, ecapa_loss=0.0002738, whisper_loss=0.09601, over 3890237.78 frames. ], batch size: 68, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:44:03,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=491270.0, ans=0.0 2024-08-10 09:44:08,616 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 09:44:43,592 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.27 vs. limit=6.0 2024-08-10 09:44:51,283 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2024-08-10 09:44:52,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491670.0, ans=0.1 2024-08-10 09:44:53,711 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2024-08-10 09:44:54,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=491670.0, ans=0.2 2024-08-10 09:44:59,317 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5700, loss[loss=0.1148, beats_loss=0.01409, ecapa_loss=0.0002993, whisper_loss=0.0977, over 13859.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01207, ecapa_loss=0.0002757, whisper_loss=0.09679, over 3911944.56 frames. ], batch size: 58, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:45:17,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=491870.0, ans=0.0 2024-08-10 09:45:33,071 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 3.076e+01 3.438e+01 4.149e+01 8.224e+01, threshold=6.876e+01, percent-clipped=3.0 2024-08-10 09:45:39,358 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-10 09:45:41,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=491970.0, ans=0.0 2024-08-10 09:46:14,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5750, loss[loss=0.1134, beats_loss=0.01293, ecapa_loss=0.0002709, whisper_loss=0.09773, over 22582.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01212, ecapa_loss=0.0002743, whisper_loss=0.0966, over 3915262.90 frames. ], batch size: 93, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:46:27,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=492270.0, ans=0.0 2024-08-10 09:46:40,929 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.89 vs. limit=10.0 2024-08-10 09:46:41,529 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 09:46:54,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=492470.0, ans=0.125 2024-08-10 09:46:59,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.98 vs. limit=12.0 2024-08-10 09:46:59,932 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 09:47:11,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=492570.0, ans=0.125 2024-08-10 09:47:23,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492670.0, ans=0.1 2024-08-10 09:47:32,108 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 09:47:37,989 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5800, loss[loss=0.09724, beats_loss=0.01206, ecapa_loss=0.000224, whisper_loss=0.08294, over 14262.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01204, ecapa_loss=0.0002757, whisper_loss=0.09724, over 3873054.06 frames. ], batch size: 54, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:47:48,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=492770.0, ans=0.125 2024-08-10 09:47:50,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-10 09:48:02,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=492870.0, ans=0.2 2024-08-10 09:48:05,344 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.573e-02 2024-08-10 09:48:09,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=492970.0, ans=0.125 2024-08-10 09:48:10,203 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.18 vs. limit=22.5 2024-08-10 09:48:11,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+01 3.153e+01 3.469e+01 4.030e+01 1.339e+02, threshold=6.938e+01, percent-clipped=1.0 2024-08-10 09:48:16,243 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 22 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-10 09:48:17,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492970.0, ans=0.1 2024-08-10 09:48:18,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=492970.0, ans=0.125 2024-08-10 09:48:28,943 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 2024-08-10 09:48:41,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=493170.0, ans=0.2 2024-08-10 09:48:45,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=493170.0, ans=0.125 2024-08-10 09:48:47,838 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5850, loss[loss=0.1118, beats_loss=0.01119, ecapa_loss=0.0002545, whisper_loss=0.09804, over 15590.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01214, ecapa_loss=0.0002767, whisper_loss=0.09649, over 3861587.64 frames. ], batch size: 61, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:48:48,525 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.68 vs. limit=10.0 2024-08-10 09:48:54,416 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 09:48:59,434 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 24 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-10 09:49:07,087 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-10 09:49:10,982 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 09:49:14,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=493470.0, ans=0.125 2024-08-10 09:49:18,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=493470.0, ans=0.025 2024-08-10 09:49:20,358 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-10 09:49:37,203 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 09:49:39,827 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 09:49:40,393 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2024-08-10 09:49:51,352 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5900, loss[loss=0.1213, beats_loss=0.01379, ecapa_loss=0.0002583, whisper_loss=0.1049, over 20778.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01217, ecapa_loss=0.0002752, whisper_loss=0.09662, over 3839783.56 frames. ], batch size: 85, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:50:07,953 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 13 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 09:50:09,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=493870.0, ans=0.09899494936611666 2024-08-10 09:50:09,830 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.53 vs. limit=10.0 2024-08-10 09:50:17,781 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 09:50:19,382 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-10 09:50:20,479 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.415e+01 2.982e+01 3.256e+01 3.844e+01 1.503e+02, threshold=6.513e+01, percent-clipped=1.0 2024-08-10 09:50:23,188 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 09:50:29,786 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 8 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-10 09:50:47,652 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 09:50:54,621 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 5950, loss[loss=0.09051, beats_loss=0.01277, ecapa_loss=0.0002867, whisper_loss=0.07487, over 20178.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.0122, ecapa_loss=0.000274, whisper_loss=0.09584, over 3856306.05 frames. ], batch size: 84, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:50:55,482 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-08-10 09:51:19,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=494470.0, ans=0.0 2024-08-10 09:51:32,078 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 09:51:33,405 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 09:51:34,618 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-10 09:51:41,135 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 09:51:58,754 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6000, loss[loss=0.1177, beats_loss=0.01273, ecapa_loss=0.0002648, whisper_loss=0.1023, over 22999.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01228, ecapa_loss=0.000272, whisper_loss=0.09564, over 3841575.66 frames. ], batch size: 92, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:51:58,755 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 09:52:40,028 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on ASR_libri: loss=0.2669, beats_loss=0, ecapa_loss=0.0008114, whisper_loss=0.2588, over 922467.00 frames. 2024-08-10 09:52:55,508 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on SV_voxceleb1: loss=0.00707, beats_loss=0, ecapa_loss=0.000707, whisper_loss=0, over 939242.00 frames. 2024-08-10 09:53:13,895 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8594, 1.7548, 1.5644, 1.4747], device='cuda:1') 2024-08-10 09:54:03,507 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4054, 2.8023, 3.1385, 2.7192], device='cuda:1') 2024-08-10 09:54:53,688 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on AT_audioset: loss=0.028, beats_loss=0.028, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 09:54:53,691 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 09:54:55,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=494770.0, ans=0.2 2024-08-10 09:55:01,614 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-10 09:55:07,972 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 09:55:10,311 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 09:55:14,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=494870.0, ans=0.125 2024-08-10 09:55:23,069 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.994e+01 3.624e+01 4.180e+01 6.998e+01, threshold=7.249e+01, percent-clipped=2.0 2024-08-10 09:55:39,911 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 09:55:43,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=495070.0, ans=0.0 2024-08-10 09:55:58,195 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6050, loss[loss=0.1066, beats_loss=0.01445, ecapa_loss=0.0003566, whisper_loss=0.08861, over 20255.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.0123, ecapa_loss=0.0002704, whisper_loss=0.09584, over 3822331.75 frames. ], batch size: 90, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:56:07,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495270.0, ans=0.1 2024-08-10 09:56:11,297 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 09:56:13,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=495370.0, ans=0.0 2024-08-10 09:56:16,120 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 09:56:19,915 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 09:56:20,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=495370.0, ans=0.0 2024-08-10 09:56:39,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=495570.0, ans=0.125 2024-08-10 09:56:43,019 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 09:56:45,126 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-10 09:57:02,735 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6100, loss[loss=0.1017, beats_loss=0.01292, ecapa_loss=0.000245, whisper_loss=0.08629, over 19280.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01226, ecapa_loss=0.0002672, whisper_loss=0.09623, over 3846002.43 frames. ], batch size: 77, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:57:06,549 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-10 09:57:10,757 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2024-08-10 09:57:11,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=495770.0, ans=0.0 2024-08-10 09:57:19,463 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 09:57:20,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=495870.0, ans=0.0 2024-08-10 09:57:26,971 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2024-08-10 09:57:31,667 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.767e+01 3.161e+01 3.682e+01 7.056e+01, threshold=6.321e+01, percent-clipped=0.0 2024-08-10 09:57:31,811 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-10 09:57:47,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=496070.0, ans=0.2 2024-08-10 09:58:01,389 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.26 vs. limit=15.0 2024-08-10 09:58:06,831 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6150, loss[loss=0.1023, beats_loss=0.01233, ecapa_loss=0.0002718, whisper_loss=0.08722, over 17250.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01227, ecapa_loss=0.0002673, whisper_loss=0.09614, over 3840136.72 frames. ], batch size: 72, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:58:12,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=496270.0, ans=0.125 2024-08-10 09:58:13,536 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 09:58:37,828 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 09:58:40,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=496470.0, ans=0.0 2024-08-10 09:58:41,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=496470.0, ans=0.04949747468305833 2024-08-10 09:59:07,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=496670.0, ans=0.125 2024-08-10 09:59:09,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=496770.0, ans=0.125 2024-08-10 09:59:10,785 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6200, loss[loss=0.08721, beats_loss=0.01418, ecapa_loss=0.0002614, whisper_loss=0.07042, over 14726.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01223, ecapa_loss=0.000269, whisper_loss=0.0963, over 3849309.27 frames. ], batch size: 60, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 09:59:10,893 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 38 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-10 09:59:12,206 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-10 09:59:13,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=496770.0, ans=0.2 2024-08-10 09:59:24,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=496870.0, ans=0.2 2024-08-10 09:59:29,342 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 09:59:29,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=496870.0, ans=0.125 2024-08-10 09:59:30,668 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 09:59:34,821 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-10 09:59:39,746 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 09:59:40,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 3.143e+01 3.568e+01 4.018e+01 6.093e+01, threshold=7.137e+01, percent-clipped=0.0 2024-08-10 09:59:55,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=497070.0, ans=0.125 2024-08-10 10:00:02,177 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 16 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 10:00:06,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=497170.0, ans=0.125 2024-08-10 10:00:15,283 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 10:00:16,268 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6250, loss[loss=0.1127, beats_loss=0.0121, ecapa_loss=0.0002345, whisper_loss=0.09825, over 22816.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01212, ecapa_loss=0.00027, whisper_loss=0.09632, over 3847909.35 frames. ], batch size: 91, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:00:34,277 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 14 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-10 10:00:39,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=497370.0, ans=0.125 2024-08-10 10:00:56,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=497570.0, ans=0.125 2024-08-10 10:00:59,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-10 10:01:03,968 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 10:01:08,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=497670.0, ans=0.0 2024-08-10 10:01:13,069 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 10:01:20,900 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6300, loss[loss=0.1233, beats_loss=0.01298, ecapa_loss=0.0002364, whisper_loss=0.108, over 22245.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01216, ecapa_loss=0.0002701, whisper_loss=0.09652, over 3872399.77 frames. ], batch size: 86, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:01:28,393 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 10:01:42,843 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 20 from LS+wenet, 33 from Vox, 41 fro AS 2024-08-10 10:01:44,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=497870.0, ans=0.1 2024-08-10 10:01:50,442 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.372e+01 3.096e+01 3.544e+01 4.139e+01 6.723e+01, threshold=7.089e+01, percent-clipped=0.0 2024-08-10 10:01:53,381 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 13 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 10:02:03,810 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-10 10:02:16,149 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2024-08-10 10:02:25,842 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6350, loss[loss=0.1189, beats_loss=0.01207, ecapa_loss=0.0002743, whisper_loss=0.1041, over 22232.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01218, ecapa_loss=0.0002707, whisper_loss=0.09629, over 3845258.25 frames. ], batch size: 90, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:02:32,213 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 10:02:48,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=498370.0, ans=0.2 2024-08-10 10:02:54,511 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2024-08-10 10:03:12,098 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 10:03:29,896 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6400, loss[loss=0.1235, beats_loss=0.01252, ecapa_loss=0.0002382, whisper_loss=0.1086, over 19429.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.0122, ecapa_loss=0.0002681, whisper_loss=0.09653, over 3852088.04 frames. ], batch size: 76, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:03:35,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=498770.0, ans=0.0 2024-08-10 10:03:45,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=498870.0, ans=0.2 2024-08-10 10:04:01,637 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 3.035e+01 3.531e+01 4.097e+01 5.944e+01, threshold=7.062e+01, percent-clipped=0.0 2024-08-10 10:04:03,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=498970.0, ans=0.2 2024-08-10 10:04:07,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=498970.0, ans=0.0 2024-08-10 10:04:09,475 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 10:04:13,340 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2024-08-10 10:04:19,724 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-08-10 10:04:23,978 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-10 10:04:28,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=499170.0, ans=0.1 2024-08-10 10:04:43,303 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6450, loss[loss=0.09742, beats_loss=0.01095, ecapa_loss=0.0002998, whisper_loss=0.08347, over 21385.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01214, ecapa_loss=0.0002687, whisper_loss=0.09685, over 3906001.51 frames. ], batch size: 90, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:04:49,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499270.0, ans=0.1 2024-08-10 10:04:51,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=499270.0, ans=0.125 2024-08-10 10:04:59,712 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 10:05:02,772 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 10:05:06,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=499370.0, ans=0.125 2024-08-10 10:05:08,353 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2024-08-10 10:05:09,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=499370.0, ans=0.0 2024-08-10 10:05:09,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=499370.0, ans=0.0 2024-08-10 10:05:13,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=499370.0, ans=0.025 2024-08-10 10:05:17,145 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-10 10:05:20,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=499470.0, ans=0.0 2024-08-10 10:05:49,909 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2024-08-10 10:05:57,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=499770.0, ans=0.125 2024-08-10 10:05:58,908 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6500, loss[loss=0.09842, beats_loss=0.01301, ecapa_loss=0.0002577, whisper_loss=0.08283, over 16646.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.0121, ecapa_loss=0.0002683, whisper_loss=0.09758, over 3913296.11 frames. ], batch size: 66, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:05:59,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=499770.0, ans=0.0 2024-08-10 10:06:01,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=15.0 2024-08-10 10:06:10,213 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 10:06:11,585 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 34 from Vox, 27 fro AS 2024-08-10 10:06:13,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=499870.0, ans=0.125 2024-08-10 10:06:16,273 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 10:06:18,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=499870.0, ans=0.125 2024-08-10 10:06:18,589 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2024-08-10 10:06:25,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499870.0, ans=0.1 2024-08-10 10:06:33,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 3.134e+01 3.492e+01 3.881e+01 6.321e+01, threshold=6.984e+01, percent-clipped=0.0 2024-08-10 10:06:34,148 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 10:06:41,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=499970.0, ans=0.04949747468305833 2024-08-10 10:06:45,311 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-10 10:07:04,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=500170.0, ans=0.5 2024-08-10 10:07:05,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=500170.0, ans=0.125 2024-08-10 10:07:15,589 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6550, loss[loss=0.1375, beats_loss=0.01279, ecapa_loss=0.0002299, whisper_loss=0.1224, over 24262.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01211, ecapa_loss=0.0002694, whisper_loss=0.09758, over 3932474.14 frames. ], batch size: 93, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:07:32,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=500370.0, ans=0.0 2024-08-10 10:07:59,673 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 10:08:07,981 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 10:08:11,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=500570.0, ans=0.2 2024-08-10 10:08:13,109 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 10:08:19,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=500570.0, ans=0.1 2024-08-10 10:08:19,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.14 vs. limit=22.5 2024-08-10 10:08:28,650 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 25 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 10:08:33,141 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 10:08:41,457 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6600, loss[loss=0.1275, beats_loss=0.01112, ecapa_loss=0.0003297, whisper_loss=0.1131, over 18785.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01199, ecapa_loss=0.0002696, whisper_loss=0.0981, over 3919221.91 frames. ], batch size: 76, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:08:52,091 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 32 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 10:08:52,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=500770.0, ans=0.1 2024-08-10 10:08:54,626 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 15 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 10:09:13,927 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 10:09:18,528 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 3.113e+01 3.580e+01 3.995e+01 6.180e+01, threshold=7.160e+01, percent-clipped=0.0 2024-08-10 10:09:25,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=500970.0, ans=0.125 2024-08-10 10:09:29,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=501070.0, ans=0.5 2024-08-10 10:09:33,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=501070.0, ans=10.0 2024-08-10 10:09:38,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=501070.0, ans=0.125 2024-08-10 10:09:50,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=501170.0, ans=0.125 2024-08-10 10:09:56,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=501170.0, ans=0.05 2024-08-10 10:10:00,898 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6650, loss[loss=0.1229, beats_loss=0.009912, ecapa_loss=0.0002672, whisper_loss=0.1103, over 22028.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01209, ecapa_loss=0.0002679, whisper_loss=0.09774, over 3927346.55 frames. ], batch size: 84, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:10:03,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-08-10 10:10:20,801 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 10:10:44,177 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 10:10:47,534 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.231e+00 2024-08-10 10:10:54,677 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-08-10 10:11:02,385 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 10:11:06,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=501670.0, ans=0.125 2024-08-10 10:11:12,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=501670.0, ans=0.125 2024-08-10 10:11:13,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=501670.0, ans=0.1 2024-08-10 10:11:21,927 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6700, loss[loss=0.0982, beats_loss=0.011, ecapa_loss=0.0003024, whisper_loss=0.08417, over 16751.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01212, ecapa_loss=0.0002679, whisper_loss=0.09727, over 3920243.12 frames. ], batch size: 66, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:11:36,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=501870.0, ans=0.2 2024-08-10 10:11:39,680 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 10:11:41,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=501870.0, ans=0.0 2024-08-10 10:11:47,587 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 35 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 10:12:00,459 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.467e+01 2.966e+01 3.489e+01 3.963e+01 6.232e+01, threshold=6.977e+01, percent-clipped=0.0 2024-08-10 10:12:04,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=501970.0, ans=0.125 2024-08-10 10:12:41,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=502170.0, ans=0.125 2024-08-10 10:12:45,846 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6750, loss[loss=0.111, beats_loss=0.01038, ecapa_loss=0.0002601, whisper_loss=0.09805, over 15974.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01211, ecapa_loss=0.0002687, whisper_loss=0.09758, over 3895828.46 frames. ], batch size: 60, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:12:53,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=502270.0, ans=0.2 2024-08-10 10:12:55,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=502270.0, ans=0.125 2024-08-10 10:13:30,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=502470.0, ans=0.125 2024-08-10 10:13:42,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=502570.0, ans=0.125 2024-08-10 10:13:45,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=502570.0, ans=0.125 2024-08-10 10:13:49,203 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-10 10:13:54,550 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 10:14:09,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=502670.0, ans=0.125 2024-08-10 10:14:11,167 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6800, loss[loss=0.09409, beats_loss=0.01447, ecapa_loss=0.0002532, whisper_loss=0.0771, over 20814.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01219, ecapa_loss=0.0002684, whisper_loss=0.09654, over 3883600.64 frames. ], batch size: 84, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:14:15,805 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.91 vs. limit=6.0 2024-08-10 10:14:18,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=502770.0, ans=0.2 2024-08-10 10:14:50,967 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.003e+01 3.545e+01 4.063e+01 8.445e+01, threshold=7.089e+01, percent-clipped=2.0 2024-08-10 10:15:07,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=503070.0, ans=0.0 2024-08-10 10:15:08,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=503070.0, ans=0.2 2024-08-10 10:15:11,115 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2024-08-10 10:15:13,609 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-10 10:15:19,385 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.251e+00 2024-08-10 10:15:29,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503170.0, ans=0.1 2024-08-10 10:15:31,234 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 10:15:33,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=503170.0, ans=0.07 2024-08-10 10:15:35,988 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6850, loss[loss=0.08239, beats_loss=0.01243, ecapa_loss=0.0002949, whisper_loss=0.06701, over 18791.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01219, ecapa_loss=0.000268, whisper_loss=0.09634, over 3902449.87 frames. ], batch size: 76, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:15:37,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=503270.0, ans=0.125 2024-08-10 10:15:38,111 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=22.5 2024-08-10 10:15:52,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=503370.0, ans=0.125 2024-08-10 10:16:05,606 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 10:16:14,286 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 10:16:16,282 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-08-10 10:16:28,697 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-08-10 10:16:54,215 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6900, loss[loss=0.1143, beats_loss=0.01206, ecapa_loss=0.0002523, whisper_loss=0.09967, over 19877.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01223, ecapa_loss=0.0002682, whisper_loss=0.09542, over 3865666.79 frames. ], batch size: 80, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:17:30,425 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 3.010e+01 3.385e+01 3.920e+01 6.674e+01, threshold=6.771e+01, percent-clipped=0.0 2024-08-10 10:18:06,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=504170.0, ans=0.0 2024-08-10 10:18:14,585 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 6950, loss[loss=0.1153, beats_loss=0.01355, ecapa_loss=0.0002042, whisper_loss=0.09973, over 21687.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01229, ecapa_loss=0.0002633, whisper_loss=0.09611, over 3907599.40 frames. ], batch size: 84, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:18:53,759 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-10 10:18:53,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504470.0, ans=0.1 2024-08-10 10:19:06,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.19 vs. limit=22.5 2024-08-10 10:19:11,155 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-10 10:19:19,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=504670.0, ans=0.2 2024-08-10 10:19:31,770 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 10:19:32,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.08 vs. limit=6.0 2024-08-10 10:19:36,457 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7000, loss[loss=0.1154, beats_loss=0.01034, ecapa_loss=0.0002674, whisper_loss=0.1024, over 16845.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01229, ecapa_loss=0.0002649, whisper_loss=0.09575, over 3874231.18 frames. ], batch size: 65, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:19:39,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=504770.0, ans=0.0 2024-08-10 10:19:46,221 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.73 vs. limit=10.0 2024-08-10 10:19:53,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=504870.0, ans=0.125 2024-08-10 10:19:56,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=504870.0, ans=0.125 2024-08-10 10:20:04,219 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 10:20:09,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504970.0, ans=0.1 2024-08-10 10:20:09,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=504970.0, ans=0.125 2024-08-10 10:20:10,902 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 10:20:12,512 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.871e+01 3.202e+01 3.824e+01 7.169e+01, threshold=6.405e+01, percent-clipped=1.0 2024-08-10 10:20:16,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=504970.0, ans=0.0 2024-08-10 10:20:26,724 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 10:20:35,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=505070.0, ans=0.125 2024-08-10 10:20:39,419 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=12.0 2024-08-10 10:20:40,185 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 34 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 10:20:42,194 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.074e-01 2024-08-10 10:20:57,970 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7050, loss[loss=0.1238, beats_loss=0.01027, ecapa_loss=0.0002697, whisper_loss=0.1108, over 20009.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01225, ecapa_loss=0.0002641, whisper_loss=0.09579, over 3893745.08 frames. ], batch size: 78, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:20:58,171 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 10:21:04,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505270.0, ans=0.1 2024-08-10 10:21:10,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=505270.0, ans=0.125 2024-08-10 10:21:11,143 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=12.0 2024-08-10 10:21:33,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=505470.0, ans=0.0 2024-08-10 10:21:36,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=505470.0, ans=0.0 2024-08-10 10:21:46,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=505570.0, ans=0.0 2024-08-10 10:21:48,806 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 10:22:16,482 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7100, loss[loss=0.1076, beats_loss=0.01499, ecapa_loss=0.0002207, whisper_loss=0.09041, over 16782.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01231, ecapa_loss=0.0002623, whisper_loss=0.09502, over 3854257.46 frames. ], batch size: 67, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:22:34,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=505870.0, ans=0.0 2024-08-10 10:22:51,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=505970.0, ans=0.125 2024-08-10 10:22:54,501 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 3.041e+01 3.472e+01 4.120e+01 8.517e+01, threshold=6.943e+01, percent-clipped=2.0 2024-08-10 10:23:21,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=506170.0, ans=0.2 2024-08-10 10:23:23,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=506170.0, ans=0.1 2024-08-10 10:23:25,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=506170.0, ans=0.125 2024-08-10 10:23:30,223 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 10:23:32,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=506170.0, ans=0.125 2024-08-10 10:23:36,844 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7150, loss[loss=0.1023, beats_loss=0.01609, ecapa_loss=0.0002354, whisper_loss=0.08388, over 18765.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01228, ecapa_loss=0.0002625, whisper_loss=0.09532, over 3881975.75 frames. ], batch size: 76, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:23:44,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=506270.0, ans=0.0 2024-08-10 10:23:45,867 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-10 10:23:49,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=506270.0, ans=0.0 2024-08-10 10:23:49,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=506270.0, ans=0.125 2024-08-10 10:24:01,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=506370.0, ans=0.125 2024-08-10 10:24:09,240 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 10:24:15,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=506470.0, ans=0.0 2024-08-10 10:24:15,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=506470.0, ans=0.04949747468305833 2024-08-10 10:24:29,021 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.080e-01 2024-08-10 10:24:33,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=506570.0, ans=0.125 2024-08-10 10:24:37,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=506670.0, ans=0.125 2024-08-10 10:24:47,579 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-10 10:24:52,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=506670.0, ans=0.0 2024-08-10 10:24:55,521 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7200, loss[loss=0.1058, beats_loss=0.01332, ecapa_loss=0.0002268, whisper_loss=0.09023, over 22464.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01235, ecapa_loss=0.0002628, whisper_loss=0.09494, over 3875572.73 frames. ], batch size: 88, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:25:08,393 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.09 vs. limit=8.0 2024-08-10 10:25:09,938 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.04 vs. limit=15.0 2024-08-10 10:25:35,732 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.350e+01 3.179e+01 3.637e+01 4.087e+01 6.923e+01, threshold=7.273e+01, percent-clipped=0.0 2024-08-10 10:25:39,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=12.0 2024-08-10 10:25:43,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=506970.0, ans=0.1 2024-08-10 10:25:47,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.83 vs. limit=22.5 2024-08-10 10:25:54,580 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 29 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 10:26:00,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=507070.0, ans=0.2 2024-08-10 10:26:18,298 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7250, loss[loss=0.136, beats_loss=0.009159, ecapa_loss=0.0003257, whisper_loss=0.1236, over 15174.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01232, ecapa_loss=0.000263, whisper_loss=0.09508, over 3860547.19 frames. ], batch size: 60, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:26:30,065 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 10:26:43,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=507370.0, ans=0.025 2024-08-10 10:26:58,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=507470.0, ans=0.0 2024-08-10 10:27:16,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=507570.0, ans=0.07 2024-08-10 10:27:21,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=507670.0, ans=0.07 2024-08-10 10:27:26,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=507670.0, ans=0.125 2024-08-10 10:27:27,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=507670.0, ans=0.125 2024-08-10 10:27:30,846 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 10:27:37,450 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7300, loss[loss=0.1273, beats_loss=0.01055, ecapa_loss=0.0002281, whisper_loss=0.1145, over 16192.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01223, ecapa_loss=0.0002649, whisper_loss=0.0957, over 3877228.97 frames. ], batch size: 58, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:27:54,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=507870.0, ans=0.5 2024-08-10 10:27:58,451 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-10 10:28:08,724 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 24 from LS+wenet, 11 from Vox, 19 fro AS 2024-08-10 10:28:16,759 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.965e+01 3.375e+01 3.820e+01 5.473e+01, threshold=6.750e+01, percent-clipped=0.0 2024-08-10 10:28:23,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=507970.0, ans=0.1 2024-08-10 10:28:50,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=508170.0, ans=0.0 2024-08-10 10:28:59,661 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7350, loss[loss=0.1185, beats_loss=0.01134, ecapa_loss=0.000256, whisper_loss=0.1046, over 22807.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01219, ecapa_loss=0.0002661, whisper_loss=0.09541, over 3905396.62 frames. ], batch size: 92, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:29:16,071 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 10:29:18,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=508370.0, ans=0.2 2024-08-10 10:29:20,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=508370.0, ans=0.0 2024-08-10 10:29:36,619 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 10:29:43,520 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 10:30:03,529 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 32 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 10:30:14,944 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-10 10:30:26,677 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7400, loss[loss=0.1051, beats_loss=0.01337, ecapa_loss=0.0002679, whisper_loss=0.08901, over 17956.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01222, ecapa_loss=0.0002655, whisper_loss=0.0952, over 3896998.86 frames. ], batch size: 72, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:30:30,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=508770.0, ans=0.125 2024-08-10 10:30:34,143 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 10:30:44,779 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2024-08-10 10:30:51,281 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 10:30:52,462 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-10 10:30:55,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=508870.0, ans=0.1 2024-08-10 10:31:05,938 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.310e+01 2.905e+01 3.226e+01 3.755e+01 5.990e+01, threshold=6.451e+01, percent-clipped=0.0 2024-08-10 10:31:23,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=509070.0, ans=0.125 2024-08-10 10:31:25,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=509070.0, ans=0.125 2024-08-10 10:31:25,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=509070.0, ans=0.0 2024-08-10 10:31:52,392 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7450, loss[loss=0.09421, beats_loss=0.01223, ecapa_loss=0.0003436, whisper_loss=0.07854, over 19064.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01221, ecapa_loss=0.000267, whisper_loss=0.0954, over 3884818.84 frames. ], batch size: 83, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:32:19,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=509370.0, ans=0.125 2024-08-10 10:32:20,602 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 13 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 10:32:33,986 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 27 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-10 10:32:35,962 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.00 vs. limit=22.5 2024-08-10 10:32:41,674 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.806e+03 2024-08-10 10:32:42,636 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 40 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 10:32:45,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=509570.0, ans=0.125 2024-08-10 10:33:00,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=509670.0, ans=0.1 2024-08-10 10:33:05,220 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 11 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 10:33:06,809 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 10:33:18,619 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7500, loss[loss=0.1111, beats_loss=0.01173, ecapa_loss=0.0002846, whisper_loss=0.09652, over 22388.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.0121, ecapa_loss=0.000267, whisper_loss=0.09604, over 3882542.82 frames. ], batch size: 88, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:33:31,373 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-08-10 10:33:33,967 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2024-08-10 10:33:53,097 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-10 10:33:58,002 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 3.154e+01 3.513e+01 4.160e+01 5.952e+01, threshold=7.025e+01, percent-clipped=0.0 2024-08-10 10:34:19,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=510070.0, ans=0.125 2024-08-10 10:34:32,236 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-10 10:34:34,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=510170.0, ans=0.1 2024-08-10 10:34:35,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=510170.0, ans=0.125 2024-08-10 10:34:39,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=510170.0, ans=0.07 2024-08-10 10:34:43,754 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7550, loss[loss=0.1243, beats_loss=0.009332, ecapa_loss=0.0002938, whisper_loss=0.1121, over 16271.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01215, ecapa_loss=0.0002671, whisper_loss=0.09629, over 3903812.11 frames. ], batch size: 63, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:35:02,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=510370.0, ans=0.125 2024-08-10 10:35:03,393 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2024-08-10 10:35:14,701 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 10:35:25,388 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 20 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-10 10:35:31,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=510570.0, ans=0.1 2024-08-10 10:36:07,452 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7600, loss[loss=0.1032, beats_loss=0.01153, ecapa_loss=0.0002385, whisper_loss=0.08925, over 20689.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.0121, ecapa_loss=0.0002668, whisper_loss=0.09648, over 3907348.21 frames. ], batch size: 79, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:36:09,807 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 10:36:27,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=510870.0, ans=0.2 2024-08-10 10:36:45,918 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 2.819e+01 3.165e+01 3.521e+01 5.971e+01, threshold=6.331e+01, percent-clipped=0.0 2024-08-10 10:36:46,108 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 10:36:46,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2024-08-10 10:36:52,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=510970.0, ans=0.0 2024-08-10 10:37:21,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=511170.0, ans=0.0 2024-08-10 10:37:21,493 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-08-10 10:37:34,292 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7650, loss[loss=0.1075, beats_loss=0.01274, ecapa_loss=0.0002733, whisper_loss=0.09203, over 22670.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01205, ecapa_loss=0.0002685, whisper_loss=0.09675, over 3923168.68 frames. ], batch size: 92, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:37:34,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=511270.0, ans=0.1 2024-08-10 10:37:59,712 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 10:38:12,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=511470.0, ans=0.2 2024-08-10 10:38:14,075 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 10:38:18,597 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-10 10:38:24,665 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=15.0 2024-08-10 10:38:49,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=511670.0, ans=0.0 2024-08-10 10:38:50,944 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 10:38:57,002 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 10:38:58,913 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7700, loss[loss=0.09732, beats_loss=0.01351, ecapa_loss=0.0002833, whisper_loss=0.08097, over 22079.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01219, ecapa_loss=0.0002679, whisper_loss=0.09584, over 3910469.85 frames. ], batch size: 92, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:39:01,148 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.96 vs. limit=22.5 2024-08-10 10:39:09,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=511770.0, ans=0.0 2024-08-10 10:39:32,458 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 10:39:37,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=511970.0, ans=0.125 2024-08-10 10:39:39,446 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.498e+01 3.237e+01 3.581e+01 4.281e+01 8.585e+01, threshold=7.162e+01, percent-clipped=2.0 2024-08-10 10:39:54,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=512070.0, ans=0.0 2024-08-10 10:40:09,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=512170.0, ans=0.125 2024-08-10 10:40:16,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.37 vs. limit=22.5 2024-08-10 10:40:16,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=512170.0, ans=0.125 2024-08-10 10:40:17,826 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 10:40:22,750 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7750, loss[loss=0.1001, beats_loss=0.01442, ecapa_loss=0.0002067, whisper_loss=0.08356, over 23244.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01218, ecapa_loss=0.0002671, whisper_loss=0.09543, over 3885885.05 frames. ], batch size: 92, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:40:37,195 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=1.054e-02 2024-08-10 10:40:55,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=512470.0, ans=0.2 2024-08-10 10:41:11,291 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 10:41:12,947 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 10:41:27,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=512670.0, ans=0.0 2024-08-10 10:41:31,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=512670.0, ans=0.125 2024-08-10 10:41:34,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=512670.0, ans=0.2 2024-08-10 10:41:45,636 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7800, loss[loss=0.1006, beats_loss=0.01319, ecapa_loss=0.0003143, whisper_loss=0.08426, over 19473.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01222, ecapa_loss=0.0002651, whisper_loss=0.09547, over 3898489.87 frames. ], batch size: 83, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:41:58,103 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.78 vs. limit=10.0 2024-08-10 10:42:14,479 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.99 vs. limit=22.5 2024-08-10 10:42:23,026 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 3.017e+01 3.377e+01 3.890e+01 5.572e+01, threshold=6.753e+01, percent-clipped=0.0 2024-08-10 10:42:25,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=512970.0, ans=0.025 2024-08-10 10:42:45,232 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=15.0 2024-08-10 10:42:51,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=513170.0, ans=0.0 2024-08-10 10:42:55,920 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 10:43:01,795 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2024-08-10 10:43:03,915 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7850, loss[loss=0.1127, beats_loss=0.01224, ecapa_loss=0.0002725, whisper_loss=0.09773, over 16081.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01217, ecapa_loss=0.0002668, whisper_loss=0.096, over 3895547.26 frames. ], batch size: 63, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:43:34,945 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 10:43:46,599 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 10:43:50,783 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 10:43:57,589 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 10:44:07,142 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-10 10:44:22,460 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 10:44:24,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=513670.0, ans=0.125 2024-08-10 10:44:27,098 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7900, loss[loss=0.1207, beats_loss=0.01037, ecapa_loss=0.0002362, whisper_loss=0.108, over 19970.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01225, ecapa_loss=0.0002659, whisper_loss=0.09619, over 3904554.37 frames. ], batch size: 76, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:44:30,789 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 39 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 10:44:32,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=513770.0, ans=0.125 2024-08-10 10:44:41,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=513770.0, ans=0.0 2024-08-10 10:44:45,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=513870.0, ans=0.2 2024-08-10 10:44:45,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=513870.0, ans=0.0 2024-08-10 10:44:55,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=513870.0, ans=0.125 2024-08-10 10:44:56,035 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.81 vs. limit=22.5 2024-08-10 10:45:05,806 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+01 2.988e+01 3.259e+01 3.767e+01 5.929e+01, threshold=6.519e+01, percent-clipped=0.0 2024-08-10 10:45:06,001 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-10 10:45:26,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=514070.0, ans=0.125 2024-08-10 10:45:31,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=514070.0, ans=0.125 2024-08-10 10:45:31,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=514070.0, ans=0.05 2024-08-10 10:45:42,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=514170.0, ans=0.125 2024-08-10 10:45:50,664 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 7950, loss[loss=0.09478, beats_loss=0.01024, ecapa_loss=0.0003083, whisper_loss=0.08146, over 22114.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01221, ecapa_loss=0.0002658, whisper_loss=0.09648, over 3926414.40 frames. ], batch size: 90, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:45:52,241 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-10 10:46:05,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=514370.0, ans=0.125 2024-08-10 10:46:08,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=514370.0, ans=0.04949747468305833 2024-08-10 10:46:19,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=514370.0, ans=0.125 2024-08-10 10:46:22,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=514470.0, ans=0.0 2024-08-10 10:46:23,228 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=22.5 2024-08-10 10:46:32,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=514470.0, ans=0.125 2024-08-10 10:46:56,951 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-10 10:47:12,089 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8000, loss[loss=0.1246, beats_loss=0.01157, ecapa_loss=0.0002428, whisper_loss=0.1106, over 22212.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01215, ecapa_loss=0.0002653, whisper_loss=0.09721, over 3925796.19 frames. ], batch size: 89, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:47:21,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2024-08-10 10:47:27,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=514870.0, ans=0.2 2024-08-10 10:47:35,404 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-10 10:47:45,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=514970.0, ans=0.0 2024-08-10 10:47:52,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 2.849e+01 3.134e+01 3.665e+01 7.663e+01, threshold=6.268e+01, percent-clipped=1.0 2024-08-10 10:48:00,401 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2024-08-10 10:48:16,288 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 10:48:39,098 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2024-08-10 10:48:40,305 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8050, loss[loss=0.1155, beats_loss=0.01074, ecapa_loss=0.0002779, whisper_loss=0.102, over 19395.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01212, ecapa_loss=0.0002655, whisper_loss=0.09705, over 3913728.65 frames. ], batch size: 76, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:49:29,248 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 10:50:01,150 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.489e-02 2024-08-10 10:50:31,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=515670.0, ans=0.2 2024-08-10 10:50:31,651 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2024-08-10 10:50:32,648 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 10:50:34,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=515670.0, ans=0.125 2024-08-10 10:50:36,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=515670.0, ans=0.125 2024-08-10 10:50:41,234 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8100, loss[loss=0.1363, beats_loss=0.009445, ecapa_loss=0.0002803, whisper_loss=0.1241, over 21163.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01206, ecapa_loss=0.0002658, whisper_loss=0.09764, over 3875352.93 frames. ], batch size: 81, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:50:48,376 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 10:50:55,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2024-08-10 10:50:56,703 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 10:51:03,207 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 10:51:04,250 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=15.0 2024-08-10 10:51:07,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=515870.0, ans=0.125 2024-08-10 10:51:20,157 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 3.111e+01 3.674e+01 4.170e+01 5.858e+01, threshold=7.349e+01, percent-clipped=0.0 2024-08-10 10:51:28,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=515970.0, ans=0.125 2024-08-10 10:51:34,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=516070.0, ans=0.2 2024-08-10 10:51:58,914 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-10 10:52:03,387 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8150, loss[loss=0.1152, beats_loss=0.01207, ecapa_loss=0.0002844, whisper_loss=0.1003, over 18692.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01208, ecapa_loss=0.0002649, whisper_loss=0.09726, over 3882004.04 frames. ], batch size: 72, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:52:32,409 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 10:52:45,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=516470.0, ans=0.125 2024-08-10 10:52:51,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=516570.0, ans=15.0 2024-08-10 10:52:51,497 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.42 vs. limit=15.0 2024-08-10 10:52:58,868 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 10:53:05,019 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 10:53:05,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=516570.0, ans=6.0 2024-08-10 10:53:10,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=516670.0, ans=0.0 2024-08-10 10:53:14,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=516670.0, ans=0.0 2024-08-10 10:53:23,793 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8200, loss[loss=0.1227, beats_loss=0.01037, ecapa_loss=0.0002953, whisper_loss=0.1093, over 18555.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.012, ecapa_loss=0.0002682, whisper_loss=0.09754, over 3887507.61 frames. ], batch size: 72, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:54:00,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.913e+01 3.375e+01 3.842e+01 5.271e+01, threshold=6.749e+01, percent-clipped=0.0 2024-08-10 10:54:16,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=517070.0, ans=0.125 2024-08-10 10:54:24,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517070.0, ans=0.1 2024-08-10 10:54:27,418 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 27 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-10 10:54:29,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=517170.0, ans=0.0 2024-08-10 10:54:30,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=517170.0, ans=0.0 2024-08-10 10:54:41,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=517270.0, ans=0.0 2024-08-10 10:54:42,814 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8250, loss[loss=0.1, beats_loss=0.01507, ecapa_loss=0.0002442, whisper_loss=0.08252, over 19243.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01202, ecapa_loss=0.0002677, whisper_loss=0.09695, over 3895194.04 frames. ], batch size: 77, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:54:54,753 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 14 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 10:55:10,468 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 10:55:21,473 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 10:55:27,157 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2024-08-10 10:55:47,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=517670.0, ans=0.125 2024-08-10 10:55:54,803 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.000e-01 2024-08-10 10:55:57,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=517670.0, ans=0.0 2024-08-10 10:55:58,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=517670.0, ans=10.0 2024-08-10 10:56:00,837 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8300, loss[loss=0.1045, beats_loss=0.008564, ecapa_loss=0.0003428, whisper_loss=0.09252, over 14744.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01212, ecapa_loss=0.0002648, whisper_loss=0.09648, over 3903966.16 frames. ], batch size: 61, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 10:56:06,674 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=12.0 2024-08-10 10:56:16,229 INFO [train_multi_KD3.py:844] (1/4) A total of 97 cuts. 23 from LS+wenet, 30 from Vox, 44 fro AS 2024-08-10 10:56:29,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=517870.0, ans=0.2 2024-08-10 10:56:36,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.966e+01 3.242e+01 3.921e+01 6.642e+01, threshold=6.483e+01, percent-clipped=0.0 2024-08-10 10:56:46,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=517970.0, ans=0.0 2024-08-10 10:56:46,738 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-08-10 10:56:58,697 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-10 10:57:08,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=518170.0, ans=0.0 2024-08-10 10:57:24,707 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8350, loss[loss=0.0979, beats_loss=0.01383, ecapa_loss=0.0002557, whisper_loss=0.08151, over 18469.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.0122, ecapa_loss=0.0002658, whisper_loss=0.09644, over 3891341.80 frames. ], batch size: 73, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 10:57:31,490 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.23 vs. limit=22.5 2024-08-10 10:57:38,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=518270.0, ans=0.125 2024-08-10 10:57:55,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=518370.0, ans=0.125 2024-08-10 10:58:00,812 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 10:58:01,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=518470.0, ans=0.1 2024-08-10 10:58:04,636 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.91 vs. limit=15.0 2024-08-10 10:58:10,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=518470.0, ans=0.1 2024-08-10 10:58:12,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=518470.0, ans=0.025 2024-08-10 10:58:20,941 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 14 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 10:58:53,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=518670.0, ans=0.125 2024-08-10 10:58:55,136 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 10:58:59,220 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8400, loss[loss=0.1152, beats_loss=0.01246, ecapa_loss=0.0002524, whisper_loss=0.1002, over 21927.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01217, ecapa_loss=0.0002651, whisper_loss=0.09646, over 3883013.60 frames. ], batch size: 84, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 10:59:14,707 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 10:59:17,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=518870.0, ans=0.125 2024-08-10 10:59:19,013 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-08-10 10:59:23,101 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 10:59:37,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=518970.0, ans=0.0 2024-08-10 10:59:40,888 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 3.091e+01 3.394e+01 4.166e+01 8.578e+01, threshold=6.788e+01, percent-clipped=4.0 2024-08-10 10:59:41,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=518970.0, ans=0.2 2024-08-10 10:59:47,970 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 10:59:57,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=519070.0, ans=0.2 2024-08-10 11:00:03,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=519070.0, ans=0.2 2024-08-10 11:00:24,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=519170.0, ans=0.0 2024-08-10 11:00:25,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=519270.0, ans=0.07 2024-08-10 11:00:27,185 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8450, loss[loss=0.1307, beats_loss=0.009033, ecapa_loss=0.0002846, whisper_loss=0.1188, over 16503.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01204, ecapa_loss=0.0002667, whisper_loss=0.0969, over 3843245.79 frames. ], batch size: 61, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 11:00:28,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2024-08-10 11:00:33,539 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 11:00:34,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=519270.0, ans=0.125 2024-08-10 11:00:57,975 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 11:01:03,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=519470.0, ans=0.2 2024-08-10 11:01:20,153 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 30 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 11:01:20,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=519570.0, ans=0.1 2024-08-10 11:01:20,798 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2024-08-10 11:01:25,002 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-10 11:01:29,113 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 11:01:43,318 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 11:01:48,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=519670.0, ans=0.125 2024-08-10 11:01:51,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=519670.0, ans=0.2 2024-08-10 11:01:54,768 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8500, loss[loss=0.09199, beats_loss=0.01258, ecapa_loss=0.0002433, whisper_loss=0.07697, over 14411.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01207, ecapa_loss=0.0002662, whisper_loss=0.09696, over 3854456.23 frames. ], batch size: 57, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 11:01:55,016 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 11:02:28,085 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 11:02:40,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.432e+01 3.171e+01 3.733e+01 4.165e+01 6.058e+01, threshold=7.466e+01, percent-clipped=0.0 2024-08-10 11:02:44,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=519970.0, ans=0.0 2024-08-10 11:02:48,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=519970.0, ans=0.125 2024-08-10 11:02:49,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=520070.0, ans=0.125 2024-08-10 11:02:59,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=520070.0, ans=0.05 2024-08-10 11:03:03,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=520070.0, ans=0.0 2024-08-10 11:03:26,516 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8550, loss[loss=0.1341, beats_loss=0.01039, ecapa_loss=0.0003229, whisper_loss=0.1204, over 22561.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.0121, ecapa_loss=0.0002644, whisper_loss=0.09668, over 3864198.82 frames. ], batch size: 92, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:03:26,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=520270.0, ans=0.0 2024-08-10 11:03:33,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=520270.0, ans=0.125 2024-08-10 11:03:41,787 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 11:03:42,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=520370.0, ans=0.125 2024-08-10 11:03:56,293 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 32 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 11:03:57,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=520370.0, ans=15.0 2024-08-10 11:04:12,271 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 11:04:12,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=520470.0, ans=0.125 2024-08-10 11:04:33,112 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2024-08-10 11:04:46,440 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 11:04:53,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=520670.0, ans=0.0 2024-08-10 11:04:57,109 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8600, loss[loss=0.1114, beats_loss=0.01269, ecapa_loss=0.000243, whisper_loss=0.0963, over 21118.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01209, ecapa_loss=0.0002629, whisper_loss=0.09687, over 3871687.09 frames. ], batch size: 85, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:05:00,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=520770.0, ans=0.0 2024-08-10 11:05:08,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=520770.0, ans=0.125 2024-08-10 11:05:12,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=520870.0, ans=0.0 2024-08-10 11:05:36,348 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+01 3.011e+01 3.429e+01 3.879e+01 6.555e+01, threshold=6.857e+01, percent-clipped=0.0 2024-08-10 11:05:44,002 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 11:05:45,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=520970.0, ans=0.2 2024-08-10 11:05:57,841 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2024-08-10 11:06:22,140 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 11:06:27,289 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8650, loss[loss=0.1211, beats_loss=0.01111, ecapa_loss=0.0002342, whisper_loss=0.1076, over 18800.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01205, ecapa_loss=0.0002648, whisper_loss=0.09719, over 3862696.54 frames. ], batch size: 73, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:06:34,825 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-10 11:06:43,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=521370.0, ans=0.0 2024-08-10 11:07:11,450 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 11:07:15,574 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 11:07:21,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=521570.0, ans=0.125 2024-08-10 11:07:27,392 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 11:07:38,934 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=12.0 2024-08-10 11:07:47,547 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2024-08-10 11:07:57,402 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8700, loss[loss=0.0935, beats_loss=0.0123, ecapa_loss=0.0002654, whisper_loss=0.07855, over 16369.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01203, ecapa_loss=0.0002648, whisper_loss=0.0973, over 3846310.60 frames. ], batch size: 65, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:08:34,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=521970.0, ans=0.0 2024-08-10 11:08:36,259 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.73 vs. limit=15.0 2024-08-10 11:08:37,305 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.909e+01 3.289e+01 3.792e+01 9.063e+01, threshold=6.579e+01, percent-clipped=1.0 2024-08-10 11:08:59,683 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 11:09:09,931 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 11:09:22,505 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 11:09:25,665 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8750, loss[loss=0.1291, beats_loss=0.01143, ecapa_loss=0.0002572, whisper_loss=0.1151, over 23634.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01199, ecapa_loss=0.0002681, whisper_loss=0.09736, over 3857406.65 frames. ], batch size: 90, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:09:55,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=522370.0, ans=0.125 2024-08-10 11:09:58,419 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 11:09:58,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=522370.0, ans=0.125 2024-08-10 11:10:00,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=522470.0, ans=0.0 2024-08-10 11:10:25,820 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 11:10:29,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=522570.0, ans=0.125 2024-08-10 11:10:30,770 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 11:10:52,247 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8800, loss[loss=0.1031, beats_loss=0.01396, ecapa_loss=0.0001796, whisper_loss=0.08734, over 23525.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01204, ecapa_loss=0.0002667, whisper_loss=0.09692, over 3869076.00 frames. ], batch size: 90, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:10:52,373 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 11:11:14,665 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.94 vs. limit=15.0 2024-08-10 11:11:25,331 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 11:11:25,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=522970.0, ans=0.125 2024-08-10 11:11:27,390 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=12.0 2024-08-10 11:11:31,301 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2024-08-10 11:11:32,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.151e+01 3.444e+01 3.946e+01 7.427e+01, threshold=6.887e+01, percent-clipped=2.0 2024-08-10 11:11:53,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=523070.0, ans=0.125 2024-08-10 11:12:10,979 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.63 vs. limit=15.0 2024-08-10 11:12:18,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=523170.0, ans=0.0 2024-08-10 11:12:21,322 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8850, loss[loss=0.1053, beats_loss=0.01316, ecapa_loss=0.0002113, whisper_loss=0.08999, over 22125.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01202, ecapa_loss=0.0002663, whisper_loss=0.09684, over 3882973.14 frames. ], batch size: 86, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:12:27,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=523270.0, ans=0.125 2024-08-10 11:12:30,091 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-10 11:12:43,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=523370.0, ans=0.125 2024-08-10 11:12:55,627 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 11:13:18,214 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 11:13:45,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=523670.0, ans=0.1 2024-08-10 11:13:51,197 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8900, loss[loss=0.1204, beats_loss=0.00926, ecapa_loss=0.0002769, whisper_loss=0.1083, over 14631.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01214, ecapa_loss=0.0002627, whisper_loss=0.09593, over 3879504.32 frames. ], batch size: 55, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:13:59,176 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 16 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-10 11:14:20,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=523870.0, ans=0.125 2024-08-10 11:14:24,309 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2024-08-10 11:14:35,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.997e+01 3.258e+01 3.778e+01 5.539e+01, threshold=6.517e+01, percent-clipped=0.0 2024-08-10 11:14:59,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524070.0, ans=0.1 2024-08-10 11:15:17,097 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-10 11:15:22,556 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 8950, loss[loss=0.1247, beats_loss=0.01295, ecapa_loss=0.0002497, whisper_loss=0.1092, over 21598.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.0123, ecapa_loss=0.0002609, whisper_loss=0.09521, over 3885430.53 frames. ], batch size: 88, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:15:28,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524270.0, ans=0.1 2024-08-10 11:15:38,893 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.71 vs. limit=10.0 2024-08-10 11:15:48,426 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 11:15:54,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=15.0 2024-08-10 11:16:24,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=524570.0, ans=0.0 2024-08-10 11:16:26,384 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 11:16:49,805 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9000, loss[loss=0.1155, beats_loss=0.01328, ecapa_loss=0.0002491, whisper_loss=0.09976, over 19407.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.0122, ecapa_loss=0.0002615, whisper_loss=0.09588, over 3874157.07 frames. ], batch size: 79, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:16:49,805 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 11:17:36,040 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on ASR_libri: loss=0.2658, beats_loss=0, ecapa_loss=0.000793, whisper_loss=0.2579, over 922467.00 frames. 2024-08-10 11:17:54,570 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on SV_voxceleb1: loss=0.007025, beats_loss=0, ecapa_loss=0.0007025, whisper_loss=0, over 939242.00 frames. 2024-08-10 11:19:54,443 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on AT_audioset: loss=0.02753, beats_loss=0.02753, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 11:19:54,446 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 11:20:06,818 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 11:20:32,461 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.08 vs. limit=10.0 2024-08-10 11:20:33,295 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.326e+01 3.014e+01 3.320e+01 3.675e+01 5.799e+01, threshold=6.641e+01, percent-clipped=0.0 2024-08-10 11:20:39,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=524970.0, ans=0.0 2024-08-10 11:20:44,426 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.74 vs. limit=10.0 2024-08-10 11:21:00,533 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 11:21:09,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=525170.0, ans=0.125 2024-08-10 11:21:16,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=525170.0, ans=0.125 2024-08-10 11:21:19,436 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9050, loss[loss=0.1165, beats_loss=0.009822, ecapa_loss=0.0003192, whisper_loss=0.1035, over 17100.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01212, ecapa_loss=0.000263, whisper_loss=0.09614, over 3854590.03 frames. ], batch size: 67, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:21:22,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=525270.0, ans=0.0 2024-08-10 11:21:44,133 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 11:21:47,990 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 11:21:48,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=525370.0, ans=0.125 2024-08-10 11:21:58,889 INFO [train_multi_KD3.py:844] (1/4) A total of 97 cuts. 32 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 11:22:00,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=525470.0, ans=0.0 2024-08-10 11:22:11,647 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-10 11:22:24,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=525570.0, ans=0.125 2024-08-10 11:22:37,805 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-10 11:22:42,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=525770.0, ans=0.0 2024-08-10 11:22:43,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9100, loss[loss=0.1203, beats_loss=0.01034, ecapa_loss=0.0002908, whisper_loss=0.107, over 17854.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01204, ecapa_loss=0.0002656, whisper_loss=0.09673, over 3863460.49 frames. ], batch size: 72, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:22:48,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=525770.0, ans=0.125 2024-08-10 11:23:07,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.93 vs. limit=22.5 2024-08-10 11:23:09,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525870.0, ans=0.1 2024-08-10 11:23:12,631 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 19 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-10 11:23:12,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=525870.0, ans=0.125 2024-08-10 11:23:16,119 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 11:23:20,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.970e+01 3.325e+01 3.905e+01 6.354e+01, threshold=6.649e+01, percent-clipped=0.0 2024-08-10 11:23:22,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=525970.0, ans=0.1 2024-08-10 11:23:22,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=525970.0, ans=0.0 2024-08-10 11:23:36,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=526070.0, ans=0.125 2024-08-10 11:23:37,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=526070.0, ans=0.125 2024-08-10 11:23:42,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=526070.0, ans=0.125 2024-08-10 11:23:46,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=526170.0, ans=0.125 2024-08-10 11:23:51,330 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 11:23:59,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=526170.0, ans=0.125 2024-08-10 11:24:03,289 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9150, loss[loss=0.1126, beats_loss=0.01327, ecapa_loss=0.0002089, whisper_loss=0.09727, over 21507.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01194, ecapa_loss=0.0002647, whisper_loss=0.09802, over 3928390.89 frames. ], batch size: 85, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:24:07,171 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 11:24:12,594 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 11:24:29,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=526370.0, ans=0.0 2024-08-10 11:24:29,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=526370.0, ans=0.125 2024-08-10 11:24:34,424 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-10 11:24:35,989 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 11:24:46,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=526470.0, ans=0.1 2024-08-10 11:24:48,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=526470.0, ans=0.1 2024-08-10 11:24:53,486 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 11:24:56,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=526570.0, ans=0.2 2024-08-10 11:24:58,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=526570.0, ans=0.125 2024-08-10 11:25:16,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=526670.0, ans=0.0 2024-08-10 11:25:17,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=526770.0, ans=0.125 2024-08-10 11:25:18,168 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9200, loss[loss=0.09798, beats_loss=0.01343, ecapa_loss=0.0002379, whisper_loss=0.08218, over 23873.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01202, ecapa_loss=0.0002633, whisper_loss=0.09765, over 3946408.09 frames. ], batch size: 94, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:25:42,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=526870.0, ans=0.125 2024-08-10 11:25:44,608 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.729e+03 2024-08-10 11:25:46,990 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.251e+03 2024-08-10 11:25:47,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=526970.0, ans=0.125 2024-08-10 11:25:49,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.987e+01 3.332e+01 3.744e+01 5.839e+01, threshold=6.663e+01, percent-clipped=0.0 2024-08-10 11:25:55,867 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 11:25:56,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=526970.0, ans=0.0 2024-08-10 11:26:02,582 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 11:26:24,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=527270.0, ans=0.125 2024-08-10 11:26:24,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=527270.0, ans=0.0 2024-08-10 11:26:24,917 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9250, loss[loss=0.1349, beats_loss=0.01125, ecapa_loss=0.0002348, whisper_loss=0.1213, over 23096.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01208, ecapa_loss=0.0002634, whisper_loss=0.09724, over 3949511.93 frames. ], batch size: 89, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:26:27,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527270.0, ans=0.1 2024-08-10 11:26:31,506 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 11:26:34,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=527270.0, ans=0.125 2024-08-10 11:26:44,617 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 16 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 11:26:44,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=527370.0, ans=0.2 2024-08-10 11:26:47,646 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.47 vs. limit=22.5 2024-08-10 11:27:12,452 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 11:27:17,046 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.98 vs. limit=15.0 2024-08-10 11:27:26,654 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 11:27:30,347 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9300, loss[loss=0.1137, beats_loss=0.01322, ecapa_loss=0.0002501, whisper_loss=0.09797, over 16170.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01214, ecapa_loss=0.0002616, whisper_loss=0.09701, over 3948081.30 frames. ], batch size: 64, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:27:39,524 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2024-08-10 11:27:43,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=527870.0, ans=0.2 2024-08-10 11:27:52,290 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.71 vs. limit=15.0 2024-08-10 11:27:55,996 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-10 11:28:02,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+01 2.982e+01 3.468e+01 4.140e+01 6.249e+01, threshold=6.936e+01, percent-clipped=0.0 2024-08-10 11:28:09,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=527970.0, ans=0.125 2024-08-10 11:28:16,164 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 11:28:29,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=528170.0, ans=0.04949747468305833 2024-08-10 11:28:41,595 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9350, loss[loss=0.1222, beats_loss=0.009721, ecapa_loss=0.0003024, whisper_loss=0.1095, over 14834.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01207, ecapa_loss=0.0002623, whisper_loss=0.09719, over 3923917.04 frames. ], batch size: 59, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:28:44,004 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:28:58,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=528370.0, ans=0.125 2024-08-10 11:29:04,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=528370.0, ans=0.025 2024-08-10 11:29:22,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=528470.0, ans=0.0 2024-08-10 11:29:36,665 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 11:29:45,766 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.716e+00 2024-08-10 11:29:45,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=528670.0, ans=0.125 2024-08-10 11:29:47,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=528670.0, ans=0.125 2024-08-10 11:29:53,605 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9400, loss[loss=0.06927, beats_loss=0.01178, ecapa_loss=0.0003341, whisper_loss=0.05415, over 12405.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01215, ecapa_loss=0.0002615, whisper_loss=0.09671, over 3935417.26 frames. ], batch size: 53, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:30:11,842 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 11:30:31,889 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.405e+01 3.113e+01 3.432e+01 4.042e+01 8.997e+01, threshold=6.863e+01, percent-clipped=2.0 2024-08-10 11:30:34,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=528970.0, ans=0.2 2024-08-10 11:30:43,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=529070.0, ans=0.0 2024-08-10 11:30:50,645 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2024-08-10 11:31:10,408 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9450, loss[loss=0.1027, beats_loss=0.01312, ecapa_loss=0.000216, whisper_loss=0.08746, over 23616.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01213, ecapa_loss=0.0002608, whisper_loss=0.09698, over 3928845.82 frames. ], batch size: 93, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:31:33,683 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-10 11:31:47,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=529470.0, ans=0.125 2024-08-10 11:32:07,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.18 vs. limit=10.0 2024-08-10 11:32:23,510 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-10 11:32:27,196 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9500, loss[loss=0.1232, beats_loss=0.009039, ecapa_loss=0.0002748, whisper_loss=0.1114, over 16250.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01215, ecapa_loss=0.0002602, whisper_loss=0.09688, over 3930323.87 frames. ], batch size: 62, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:32:33,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=529770.0, ans=0.125 2024-08-10 11:32:35,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=529770.0, ans=0.0 2024-08-10 11:32:49,317 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:32:49,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=529870.0, ans=0.0 2024-08-10 11:33:00,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+01 2.898e+01 3.217e+01 3.700e+01 5.976e+01, threshold=6.434e+01, percent-clipped=0.0 2024-08-10 11:33:06,888 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 11:33:09,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=530070.0, ans=0.95 2024-08-10 11:33:16,984 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-10 11:33:24,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=530170.0, ans=0.125 2024-08-10 11:33:37,503 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.69 vs. limit=22.5 2024-08-10 11:33:38,072 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9550, loss[loss=0.1222, beats_loss=0.00813, ecapa_loss=0.0002569, whisper_loss=0.1115, over 16293.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01213, ecapa_loss=0.0002606, whisper_loss=0.09703, over 3946396.62 frames. ], batch size: 61, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:33:50,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=530370.0, ans=0.125 2024-08-10 11:34:12,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=530470.0, ans=0.1 2024-08-10 11:34:13,543 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:34:17,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=530570.0, ans=0.2 2024-08-10 11:34:21,006 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.76 vs. limit=6.0 2024-08-10 11:34:23,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=530570.0, ans=0.2 2024-08-10 11:34:42,236 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2024-08-10 11:34:44,410 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 11:34:45,677 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9600, loss[loss=0.1074, beats_loss=0.01287, ecapa_loss=0.0002057, whisper_loss=0.09244, over 16493.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01215, ecapa_loss=0.0002606, whisper_loss=0.09641, over 3897920.92 frames. ], batch size: 61, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:34:46,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=530770.0, ans=0.1 2024-08-10 11:35:16,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.859e+01 3.374e+01 4.050e+01 6.854e+01, threshold=6.749e+01, percent-clipped=1.0 2024-08-10 11:35:46,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=531170.0, ans=0.125 2024-08-10 11:35:51,451 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9650, loss[loss=0.1234, beats_loss=0.01077, ecapa_loss=0.0002445, whisper_loss=0.1102, over 22003.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01205, ecapa_loss=0.0002632, whisper_loss=0.09669, over 3896320.18 frames. ], batch size: 85, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:35:51,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=531270.0, ans=0.125 2024-08-10 11:35:53,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=531270.0, ans=0.2 2024-08-10 11:35:53,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=531270.0, ans=0.0 2024-08-10 11:35:53,532 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2024-08-10 11:35:55,478 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-10 11:35:57,425 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2024-08-10 11:35:57,550 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.48 vs. limit=10.0 2024-08-10 11:36:11,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=531370.0, ans=0.125 2024-08-10 11:36:12,977 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 11:36:35,781 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 11:36:43,247 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2024-08-10 11:36:53,539 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-10 11:36:55,000 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9700, loss[loss=0.09228, beats_loss=0.009763, ecapa_loss=0.0003182, whisper_loss=0.07933, over 15672.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01202, ecapa_loss=0.0002637, whisper_loss=0.09719, over 3907819.76 frames. ], batch size: 63, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:36:55,166 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 11:36:56,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=531770.0, ans=0.125 2024-08-10 11:37:15,824 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 11:37:24,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.979e+01 3.402e+01 3.794e+01 6.549e+01, threshold=6.804e+01, percent-clipped=0.0 2024-08-10 11:37:55,621 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.82 vs. limit=5.0 2024-08-10 11:38:00,167 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9750, loss[loss=0.1179, beats_loss=0.01196, ecapa_loss=0.0002422, whisper_loss=0.1036, over 23502.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01203, ecapa_loss=0.0002628, whisper_loss=0.09672, over 3885230.58 frames. ], batch size: 91, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:38:22,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=532370.0, ans=15.0 2024-08-10 11:38:24,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=532370.0, ans=0.0 2024-08-10 11:38:34,629 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 11:38:44,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=532570.0, ans=0.0 2024-08-10 11:38:47,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=532570.0, ans=0.0 2024-08-10 11:38:47,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=532570.0, ans=0.0 2024-08-10 11:38:54,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=532670.0, ans=0.2 2024-08-10 11:38:57,056 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 11:39:05,301 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 11:39:06,573 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9800, loss[loss=0.1216, beats_loss=0.01174, ecapa_loss=0.0002069, whisper_loss=0.1078, over 18995.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.012, ecapa_loss=0.0002639, whisper_loss=0.09669, over 3861808.64 frames. ], batch size: 69, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:39:07,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=532770.0, ans=0.125 2024-08-10 11:39:27,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=532870.0, ans=0.2 2024-08-10 11:39:36,043 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 3.053e+01 3.396e+01 3.815e+01 6.772e+01, threshold=6.792e+01, percent-clipped=0.0 2024-08-10 11:39:39,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=532970.0, ans=0.125 2024-08-10 11:39:39,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=532970.0, ans=0.05 2024-08-10 11:39:46,026 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=12.0 2024-08-10 11:39:50,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=533070.0, ans=0.0 2024-08-10 11:39:51,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=533070.0, ans=0.125 2024-08-10 11:39:56,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=533070.0, ans=0.0 2024-08-10 11:40:03,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=533170.0, ans=0.0 2024-08-10 11:40:10,098 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 11:40:11,254 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9850, loss[loss=0.08935, beats_loss=0.01575, ecapa_loss=0.0002144, whisper_loss=0.07145, over 19161.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01198, ecapa_loss=0.0002639, whisper_loss=0.09703, over 3856169.68 frames. ], batch size: 79, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:40:16,721 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 11:40:21,978 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 11:40:27,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.30 vs. limit=22.5 2024-08-10 11:40:32,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.57 vs. limit=12.0 2024-08-10 11:40:34,591 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 11:40:37,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=533470.0, ans=0.5 2024-08-10 11:40:47,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=533470.0, ans=0.125 2024-08-10 11:40:47,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=533470.0, ans=0.125 2024-08-10 11:41:09,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=533670.0, ans=0.125 2024-08-10 11:41:13,489 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.59 vs. limit=15.0 2024-08-10 11:41:15,654 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9900, loss[loss=0.1145, beats_loss=0.0132, ecapa_loss=0.0001901, whisper_loss=0.09939, over 22419.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01198, ecapa_loss=0.0002624, whisper_loss=0.09684, over 3843637.91 frames. ], batch size: 87, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:41:16,565 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-10 11:41:17,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=533770.0, ans=0.125 2024-08-10 11:41:24,854 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 11:41:32,799 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 11:41:45,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.867e+01 3.339e+01 3.780e+01 5.864e+01, threshold=6.678e+01, percent-clipped=0.0 2024-08-10 11:41:46,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=533970.0, ans=0.2 2024-08-10 11:41:47,228 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 11:41:51,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=533970.0, ans=0.0 2024-08-10 11:41:51,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2024-08-10 11:42:05,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=534070.0, ans=0.2 2024-08-10 11:42:12,958 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 11:42:13,439 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2024-08-10 11:42:20,687 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 9950, loss[loss=0.1032, beats_loss=0.01246, ecapa_loss=0.000198, whisper_loss=0.08871, over 16373.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01215, ecapa_loss=0.0002616, whisper_loss=0.09594, over 3846616.53 frames. ], batch size: 61, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:42:29,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=534270.0, ans=0.125 2024-08-10 11:42:34,861 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-10 11:42:45,417 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 11:42:52,044 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.152e-01 2024-08-10 11:43:03,463 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 11:43:07,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=534570.0, ans=0.0 2024-08-10 11:43:15,078 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 40 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 11:43:23,100 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 11:43:25,436 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10000, loss[loss=0.1123, beats_loss=0.0114, ecapa_loss=0.0002723, whisper_loss=0.09816, over 21744.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01209, ecapa_loss=0.0002623, whisper_loss=0.09673, over 3860218.61 frames. ], batch size: 87, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:43:33,095 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 30 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 11:43:55,614 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.932e+01 3.270e+01 3.845e+01 5.958e+01, threshold=6.541e+01, percent-clipped=0.0 2024-08-10 11:44:02,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=534970.0, ans=0.05 2024-08-10 11:44:08,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=535070.0, ans=0.0 2024-08-10 11:44:23,402 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2024-08-10 11:44:25,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=535170.0, ans=0.0 2024-08-10 11:44:30,240 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10050, loss[loss=0.1196, beats_loss=0.01257, ecapa_loss=0.0002126, whisper_loss=0.1049, over 17986.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01198, ecapa_loss=0.0002618, whisper_loss=0.09781, over 3864047.50 frames. ], batch size: 70, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:44:30,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=535270.0, ans=0.0 2024-08-10 11:44:32,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=535270.0, ans=0.0 2024-08-10 11:44:57,153 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 34 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 11:45:24,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=535670.0, ans=0.2 2024-08-10 11:45:26,861 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-10 11:45:30,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=535670.0, ans=0.125 2024-08-10 11:45:35,743 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10100, loss[loss=0.1072, beats_loss=0.01364, ecapa_loss=0.0002552, whisper_loss=0.09104, over 18818.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.012, ecapa_loss=0.0002615, whisper_loss=0.09808, over 3893336.39 frames. ], batch size: 76, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:45:43,409 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:45:44,357 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 38 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 11:45:47,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=535770.0, ans=0.125 2024-08-10 11:45:49,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=535870.0, ans=0.125 2024-08-10 11:46:05,959 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 3.067e+01 3.527e+01 4.291e+01 1.159e+02, threshold=7.053e+01, percent-clipped=2.0 2024-08-10 11:46:08,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=535970.0, ans=0.125 2024-08-10 11:46:13,724 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 11:46:22,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=536070.0, ans=0.1 2024-08-10 11:46:29,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=536170.0, ans=0.125 2024-08-10 11:46:37,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=536170.0, ans=0.125 2024-08-10 11:46:39,321 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 11:46:39,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=536270.0, ans=0.125 2024-08-10 11:46:40,428 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10150, loss[loss=0.1136, beats_loss=0.01117, ecapa_loss=0.0002425, whisper_loss=0.1, over 20204.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01201, ecapa_loss=0.0002625, whisper_loss=0.09689, over 3910477.05 frames. ], batch size: 82, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:46:51,407 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 11:46:51,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=536270.0, ans=0.0 2024-08-10 11:46:56,164 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-10 11:47:05,201 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 11:47:05,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=536370.0, ans=0.2 2024-08-10 11:47:05,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=536370.0, ans=0.125 2024-08-10 11:47:09,101 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.02 vs. limit=22.5 2024-08-10 11:47:18,449 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 11:47:25,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=536570.0, ans=0.125 2024-08-10 11:47:27,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=536570.0, ans=0.025 2024-08-10 11:47:39,448 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 11:47:40,748 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-10 11:47:42,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=536670.0, ans=0.0 2024-08-10 11:47:56,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=536770.0, ans=0.1 2024-08-10 11:47:57,612 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10200, loss[loss=0.1075, beats_loss=0.01338, ecapa_loss=0.0002516, whisper_loss=0.09156, over 20395.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01195, ecapa_loss=0.0002612, whisper_loss=0.09787, over 3911388.94 frames. ], batch size: 80, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:47:59,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=536770.0, ans=0.125 2024-08-10 11:48:03,181 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:48:04,639 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 11:48:17,336 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-10 11:48:34,797 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 3.065e+01 3.411e+01 3.914e+01 6.071e+01, threshold=6.821e+01, percent-clipped=0.0 2024-08-10 11:48:39,825 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 17 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 11:48:45,696 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 11:48:54,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=537070.0, ans=0.125 2024-08-10 11:49:20,122 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10250, loss[loss=0.1112, beats_loss=0.0133, ecapa_loss=0.000246, whisper_loss=0.09543, over 21790.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.012, ecapa_loss=0.0002622, whisper_loss=0.09726, over 3886476.53 frames. ], batch size: 86, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:49:36,913 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 11:50:16,359 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2024-08-10 11:50:17,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=537570.0, ans=0.125 2024-08-10 11:50:30,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=537670.0, ans=0.125 2024-08-10 11:50:31,973 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 11:50:34,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=537670.0, ans=0.0 2024-08-10 11:50:37,382 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-10 11:50:37,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=537670.0, ans=0.125 2024-08-10 11:50:44,661 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10300, loss[loss=0.09793, beats_loss=0.008353, ecapa_loss=0.0002798, whisper_loss=0.08678, over 14723.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01193, ecapa_loss=0.0002623, whisper_loss=0.09752, over 3869741.14 frames. ], batch size: 56, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:50:46,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=537770.0, ans=0.07 2024-08-10 11:51:15,329 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 11:51:18,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=537970.0, ans=0.125 2024-08-10 11:51:19,934 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.493e-03 2024-08-10 11:51:20,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 3.115e+01 3.523e+01 4.089e+01 1.199e+02, threshold=7.045e+01, percent-clipped=1.0 2024-08-10 11:51:26,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=537970.0, ans=0.125 2024-08-10 11:51:28,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=537970.0, ans=0.2 2024-08-10 11:51:36,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=538070.0, ans=0.125 2024-08-10 11:51:44,203 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2024-08-10 11:51:47,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=538170.0, ans=0.125 2024-08-10 11:52:04,294 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10350, loss[loss=0.1186, beats_loss=0.01199, ecapa_loss=0.0002483, whisper_loss=0.1041, over 21865.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.012, ecapa_loss=0.0002613, whisper_loss=0.0971, over 3885626.93 frames. ], batch size: 88, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:52:29,918 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.243e-01 2024-08-10 11:52:48,173 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-10 11:52:55,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.32 vs. limit=6.0 2024-08-10 11:53:06,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=538570.0, ans=0.125 2024-08-10 11:53:22,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=538670.0, ans=0.1 2024-08-10 11:53:25,073 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10400, loss[loss=0.1098, beats_loss=0.01016, ecapa_loss=0.0002606, whisper_loss=0.09707, over 19326.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01198, ecapa_loss=0.000261, whisper_loss=0.09652, over 3886561.34 frames. ], batch size: 80, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:53:28,503 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-10 11:53:51,660 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 11:53:58,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=538970.0, ans=0.2 2024-08-10 11:54:00,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=538970.0, ans=0.07 2024-08-10 11:54:01,474 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.334e+01 2.892e+01 3.209e+01 3.631e+01 5.476e+01, threshold=6.418e+01, percent-clipped=0.0 2024-08-10 11:54:15,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=539070.0, ans=0.125 2024-08-10 11:54:19,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=539070.0, ans=0.04949747468305833 2024-08-10 11:54:19,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=539070.0, ans=0.2 2024-08-10 11:54:24,071 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 11:54:33,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=539170.0, ans=0.1 2024-08-10 11:54:35,364 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 11:54:35,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=539170.0, ans=0.0 2024-08-10 11:54:44,227 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10450, loss[loss=0.09136, beats_loss=0.01298, ecapa_loss=0.0003028, whisper_loss=0.07535, over 19551.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01192, ecapa_loss=0.0002604, whisper_loss=0.09654, over 3870429.30 frames. ], batch size: 86, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:54:49,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=539270.0, ans=0.0 2024-08-10 11:54:58,839 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=15.0 2024-08-10 11:55:03,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=539370.0, ans=0.0 2024-08-10 11:55:09,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=539370.0, ans=0.1 2024-08-10 11:55:16,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=539470.0, ans=0.125 2024-08-10 11:55:37,227 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 11:55:55,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=539670.0, ans=0.07 2024-08-10 11:55:57,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=539670.0, ans=0.125 2024-08-10 11:56:01,621 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10500, loss[loss=0.1148, beats_loss=0.01104, ecapa_loss=0.000248, whisper_loss=0.1013, over 20022.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01191, ecapa_loss=0.0002607, whisper_loss=0.09662, over 3871868.47 frames. ], batch size: 78, lr: 1.45e-02, grad_scale: 2147483648.0 2024-08-10 11:56:05,747 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 16 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 11:56:06,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=539770.0, ans=0.125 2024-08-10 11:56:07,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=539770.0, ans=0.1 2024-08-10 11:56:10,989 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-10 11:56:33,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 3.038e+01 3.459e+01 3.996e+01 6.342e+01, threshold=6.919e+01, percent-clipped=0.0 2024-08-10 11:56:54,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=540070.0, ans=0.0 2024-08-10 11:57:00,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=540170.0, ans=0.1 2024-08-10 11:57:09,140 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10550, loss[loss=0.1063, beats_loss=0.01414, ecapa_loss=0.0001815, whisper_loss=0.09038, over 22909.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01191, ecapa_loss=0.0002609, whisper_loss=0.09646, over 3858499.73 frames. ], batch size: 90, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 11:57:39,858 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 11:57:41,479 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-08-10 11:57:42,241 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 11:57:49,434 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 11:57:53,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=540570.0, ans=0.2 2024-08-10 11:57:55,683 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 11:58:08,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=540670.0, ans=0.125 2024-08-10 11:58:18,500 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10600, loss[loss=0.1217, beats_loss=0.01225, ecapa_loss=0.0002215, whisper_loss=0.1072, over 13990.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.0119, ecapa_loss=0.0002637, whisper_loss=0.09656, over 3866626.54 frames. ], batch size: 54, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 11:58:25,300 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-10 11:58:36,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=540870.0, ans=0.125 2024-08-10 11:58:45,812 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 11:58:47,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2024-08-10 11:58:49,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 2.966e+01 3.321e+01 3.773e+01 6.212e+01, threshold=6.641e+01, percent-clipped=0.0 2024-08-10 11:58:54,213 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.09 vs. limit=10.0 2024-08-10 11:59:23,285 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.261e+05 2024-08-10 11:59:25,386 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10650, loss[loss=0.1133, beats_loss=0.0134, ecapa_loss=0.0002728, whisper_loss=0.09719, over 21187.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01192, ecapa_loss=0.0002615, whisper_loss=0.09711, over 3876492.38 frames. ], batch size: 89, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 11:59:37,376 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 11:59:54,717 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 11:59:57,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=541470.0, ans=0.1 2024-08-10 12:00:00,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=541470.0, ans=0.125 2024-08-10 12:00:25,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=541670.0, ans=0.1 2024-08-10 12:00:31,360 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10700, loss[loss=0.1118, beats_loss=0.01075, ecapa_loss=0.0002131, whisper_loss=0.09888, over 18881.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01187, ecapa_loss=0.0002621, whisper_loss=0.09709, over 3860067.32 frames. ], batch size: 69, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:00:33,543 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-10 12:00:52,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=541870.0, ans=0.0 2024-08-10 12:00:56,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=541870.0, ans=0.1 2024-08-10 12:00:57,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=541970.0, ans=0.125 2024-08-10 12:00:58,958 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 12:01:03,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.479e+01 3.156e+01 3.555e+01 4.088e+01 6.627e+01, threshold=7.109e+01, percent-clipped=0.0 2024-08-10 12:01:12,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=542070.0, ans=0.0 2024-08-10 12:01:16,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=542070.0, ans=0.0 2024-08-10 12:01:24,445 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2024-08-10 12:01:31,892 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.399e-02 2024-08-10 12:01:33,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=542170.0, ans=0.125 2024-08-10 12:01:35,686 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 12:01:39,363 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10750, loss[loss=0.07364, beats_loss=0.01657, ecapa_loss=0.000247, whisper_loss=0.0546, over 14934.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01178, ecapa_loss=0.000262, whisper_loss=0.09771, over 3878875.21 frames. ], batch size: 66, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:01:45,567 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.27 vs. limit=22.5 2024-08-10 12:01:51,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=542370.0, ans=0.0 2024-08-10 12:01:54,049 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 12:01:58,274 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 12:02:05,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=542470.0, ans=0.1 2024-08-10 12:02:17,503 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.152e-01 2024-08-10 12:02:23,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=542570.0, ans=0.0 2024-08-10 12:02:27,611 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 20 from LS+wenet, 23 from Vox, 51 fro AS 2024-08-10 12:02:42,991 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.27 vs. limit=15.0 2024-08-10 12:02:45,989 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10800, loss[loss=0.09882, beats_loss=0.01091, ecapa_loss=0.0003043, whisper_loss=0.08487, over 17382.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01191, ecapa_loss=0.0002605, whisper_loss=0.09693, over 3901815.14 frames. ], batch size: 73, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:02:47,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=542770.0, ans=0.2 2024-08-10 12:03:11,090 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-08-10 12:03:17,118 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 3.065e+01 3.589e+01 4.278e+01 6.968e+01, threshold=7.178e+01, percent-clipped=0.0 2024-08-10 12:03:23,811 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2024-08-10 12:03:26,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=543070.0, ans=0.2 2024-08-10 12:03:30,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=543070.0, ans=0.125 2024-08-10 12:03:53,979 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10850, loss[loss=0.1044, beats_loss=0.01203, ecapa_loss=0.0002179, whisper_loss=0.0902, over 22281.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01189, ecapa_loss=0.0002615, whisper_loss=0.09755, over 3923151.51 frames. ], batch size: 89, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:04:01,164 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-10 12:04:13,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=543370.0, ans=0.125 2024-08-10 12:04:15,916 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 32 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 12:04:19,204 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-10 12:04:29,296 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 12:04:39,275 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 12:04:40,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=543570.0, ans=0.125 2024-08-10 12:04:48,123 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 12:04:48,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=543670.0, ans=0.125 2024-08-10 12:04:58,280 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 33 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 12:05:02,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=543770.0, ans=0.0 2024-08-10 12:05:03,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10900, loss[loss=0.1112, beats_loss=0.01077, ecapa_loss=0.0002658, whisper_loss=0.09777, over 22422.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01189, ecapa_loss=0.0002625, whisper_loss=0.09773, over 3934728.03 frames. ], batch size: 90, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:05:03,177 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 12:05:17,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=543870.0, ans=0.2 2024-08-10 12:05:34,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=543970.0, ans=0.125 2024-08-10 12:05:35,050 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 3.064e+01 3.469e+01 3.864e+01 6.688e+01, threshold=6.938e+01, percent-clipped=0.0 2024-08-10 12:05:35,218 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 12:05:48,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=544070.0, ans=0.1 2024-08-10 12:06:13,375 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 10950, loss[loss=0.1173, beats_loss=0.01344, ecapa_loss=0.0002543, whisper_loss=0.1014, over 20821.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01184, ecapa_loss=0.0002623, whisper_loss=0.09819, over 3941589.95 frames. ], batch size: 83, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:06:25,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=544370.0, ans=0.125 2024-08-10 12:06:28,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=544370.0, ans=0.0 2024-08-10 12:06:44,567 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 12:07:09,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=544670.0, ans=0.05 2024-08-10 12:07:18,472 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 12:07:19,634 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11000, loss[loss=0.1138, beats_loss=0.01096, ecapa_loss=0.0002353, whisper_loss=0.1005, over 17358.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01186, ecapa_loss=0.0002631, whisper_loss=0.09778, over 3958101.90 frames. ], batch size: 66, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:07:22,761 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 12:07:23,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=544770.0, ans=0.125 2024-08-10 12:07:25,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=544770.0, ans=0.125 2024-08-10 12:07:35,278 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.28 vs. limit=15.0 2024-08-10 12:07:42,987 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 12:07:49,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=544970.0, ans=0.125 2024-08-10 12:07:50,197 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 2.903e+01 3.309e+01 3.802e+01 5.297e+01, threshold=6.618e+01, percent-clipped=0.0 2024-08-10 12:07:50,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=544970.0, ans=0.125 2024-08-10 12:07:56,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=544970.0, ans=0.2 2024-08-10 12:08:01,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=545070.0, ans=0.0 2024-08-10 12:08:19,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=545170.0, ans=0.125 2024-08-10 12:08:20,701 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-10 12:08:25,650 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11050, loss[loss=0.1225, beats_loss=0.01171, ecapa_loss=0.0002576, whisper_loss=0.1082, over 23127.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01196, ecapa_loss=0.0002607, whisper_loss=0.0973, over 3932873.03 frames. ], batch size: 93, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:08:27,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=545270.0, ans=0.125 2024-08-10 12:08:56,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=545470.0, ans=0.0 2024-08-10 12:09:13,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=545570.0, ans=0.0 2024-08-10 12:09:17,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=545670.0, ans=0.125 2024-08-10 12:09:21,173 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 12:09:21,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=545670.0, ans=0.125 2024-08-10 12:09:25,123 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 14 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-10 12:09:28,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=545670.0, ans=0.1 2024-08-10 12:09:31,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11100, loss[loss=0.1248, beats_loss=0.01175, ecapa_loss=0.0002415, whisper_loss=0.1107, over 24350.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01197, ecapa_loss=0.00026, whisper_loss=0.09675, over 3931827.10 frames. ], batch size: 94, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:09:34,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=545770.0, ans=0.1 2024-08-10 12:09:57,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=545970.0, ans=0.125 2024-08-10 12:10:02,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.318e+01 3.025e+01 3.487e+01 4.357e+01 7.811e+01, threshold=6.974e+01, percent-clipped=1.0 2024-08-10 12:10:03,997 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-10 12:10:07,823 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 12:10:20,006 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 41 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 12:10:21,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=546070.0, ans=0.2 2024-08-10 12:10:23,882 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 12:10:24,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=546170.0, ans=0.1 2024-08-10 12:10:38,338 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11150, loss[loss=0.1123, beats_loss=0.01262, ecapa_loss=0.0002321, whisper_loss=0.09736, over 23363.00 frames. ], tot_loss[loss=0.112, beats_loss=0.0119, ecapa_loss=0.0002605, whisper_loss=0.09748, over 3920702.17 frames. ], batch size: 93, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:10:48,418 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2024-08-10 12:10:52,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=546370.0, ans=0.125 2024-08-10 12:11:04,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=546470.0, ans=0.0 2024-08-10 12:11:05,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=546470.0, ans=0.125 2024-08-10 12:11:07,075 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.24 vs. limit=22.5 2024-08-10 12:11:13,661 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2024-08-10 12:11:30,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=546670.0, ans=0.125 2024-08-10 12:11:31,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=546670.0, ans=0.1 2024-08-10 12:11:32,151 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.87 vs. limit=22.5 2024-08-10 12:11:33,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=546670.0, ans=0.04949747468305833 2024-08-10 12:11:33,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=546670.0, ans=0.125 2024-08-10 12:11:38,727 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 22 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 12:11:44,782 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11200, loss[loss=0.1052, beats_loss=0.01268, ecapa_loss=0.0002469, whisper_loss=0.09002, over 18296.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01194, ecapa_loss=0.0002605, whisper_loss=0.09718, over 3914305.23 frames. ], batch size: 74, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:11:50,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=546770.0, ans=0.1 2024-08-10 12:12:10,592 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2024-08-10 12:12:12,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.20 vs. limit=15.0 2024-08-10 12:12:15,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 3.126e+01 3.422e+01 3.938e+01 7.786e+01, threshold=6.843e+01, percent-clipped=1.0 2024-08-10 12:12:25,309 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-08-10 12:12:41,473 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 12:12:43,102 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-10 12:12:51,573 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11250, loss[loss=0.1237, beats_loss=0.01299, ecapa_loss=0.0002151, whisper_loss=0.1086, over 16591.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01201, ecapa_loss=0.0002586, whisper_loss=0.09739, over 3912559.36 frames. ], batch size: 65, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:13:01,601 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 12:13:17,727 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 12:13:21,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=547470.0, ans=0.125 2024-08-10 12:13:26,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=547470.0, ans=0.125 2024-08-10 12:13:28,025 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=12.0 2024-08-10 12:13:34,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=547570.0, ans=10.0 2024-08-10 12:13:37,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547570.0, ans=0.1 2024-08-10 12:13:49,724 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 12:13:51,074 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 15 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 12:13:58,968 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11300, loss[loss=0.1102, beats_loss=0.01214, ecapa_loss=0.0002317, whisper_loss=0.09574, over 19384.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01196, ecapa_loss=0.0002582, whisper_loss=0.09708, over 3886597.74 frames. ], batch size: 78, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:14:05,266 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.73 vs. limit=15.0 2024-08-10 12:14:06,458 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-10 12:14:22,956 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.34 vs. limit=12.0 2024-08-10 12:14:30,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 3.068e+01 3.483e+01 4.119e+01 9.369e+01, threshold=6.966e+01, percent-clipped=1.0 2024-08-10 12:14:33,020 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 12:14:43,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=548070.0, ans=0.125 2024-08-10 12:14:53,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=548170.0, ans=0.0 2024-08-10 12:14:56,797 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 12:14:58,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=548170.0, ans=0.125 2024-08-10 12:15:05,731 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11350, loss[loss=0.1341, beats_loss=0.0119, ecapa_loss=0.0002394, whisper_loss=0.1198, over 23391.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01205, ecapa_loss=0.0002595, whisper_loss=0.09668, over 3895359.64 frames. ], batch size: 91, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:15:07,172 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 12:15:15,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=548270.0, ans=0.1 2024-08-10 12:15:27,963 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 12:15:38,873 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 30 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 12:15:57,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2024-08-10 12:16:11,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11400, loss[loss=0.1061, beats_loss=0.0125, ecapa_loss=0.0002004, whisper_loss=0.09161, over 24131.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01201, ecapa_loss=0.0002591, whisper_loss=0.0967, over 3894279.30 frames. ], batch size: 94, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:16:17,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=15.0 2024-08-10 12:16:20,333 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-10 12:16:20,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=548770.0, ans=0.125 2024-08-10 12:16:20,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=548770.0, ans=0.1 2024-08-10 12:16:28,232 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 21 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-10 12:16:31,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=548870.0, ans=0.2 2024-08-10 12:16:31,168 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 12:16:39,182 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.100e-01 2024-08-10 12:16:42,665 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 2.940e+01 3.301e+01 3.929e+01 5.377e+01, threshold=6.601e+01, percent-clipped=0.0 2024-08-10 12:16:43,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=548970.0, ans=0.015 2024-08-10 12:16:51,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549070.0, ans=0.1 2024-08-10 12:16:52,987 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-10 12:17:03,093 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.73 vs. limit=22.5 2024-08-10 12:17:17,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=549270.0, ans=0.0 2024-08-10 12:17:18,466 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11450, loss[loss=0.1077, beats_loss=0.01377, ecapa_loss=0.0002303, whisper_loss=0.09161, over 18998.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01207, ecapa_loss=0.0002571, whisper_loss=0.09652, over 3900034.47 frames. ], batch size: 75, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:17:37,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=549370.0, ans=0.0 2024-08-10 12:17:50,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=549470.0, ans=0.125 2024-08-10 12:18:04,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=549570.0, ans=0.04949747468305833 2024-08-10 12:18:09,609 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 12:18:13,599 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 12:18:26,123 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11500, loss[loss=0.07669, beats_loss=0.01468, ecapa_loss=0.0002421, whisper_loss=0.05959, over 19486.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01207, ecapa_loss=0.0002563, whisper_loss=0.09634, over 3884607.15 frames. ], batch size: 78, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:18:29,097 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 39 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 12:18:36,504 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 12:18:38,089 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 12:18:45,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=549870.0, ans=0.125 2024-08-10 12:18:48,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=549870.0, ans=0.0 2024-08-10 12:18:56,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.942e+01 3.473e+01 3.989e+01 7.170e+01, threshold=6.945e+01, percent-clipped=1.0 2024-08-10 12:19:03,435 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 16 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 12:19:04,759 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 12:19:12,506 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-10 12:19:15,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=550070.0, ans=0.125 2024-08-10 12:19:29,263 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-08-10 12:19:32,321 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11550, loss[loss=0.1104, beats_loss=0.01618, ecapa_loss=0.0002203, whisper_loss=0.09201, over 15442.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01205, ecapa_loss=0.0002585, whisper_loss=0.09643, over 3856910.29 frames. ], batch size: 63, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:19:34,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=550270.0, ans=0.125 2024-08-10 12:19:46,962 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 12:19:52,366 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 12:20:06,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=550470.0, ans=0.0 2024-08-10 12:20:09,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=550470.0, ans=0.0 2024-08-10 12:20:16,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=550570.0, ans=0.125 2024-08-10 12:20:26,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=550670.0, ans=0.2 2024-08-10 12:20:37,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=550770.0, ans=0.0 2024-08-10 12:20:38,005 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11600, loss[loss=0.106, beats_loss=0.0137, ecapa_loss=0.0001996, whisper_loss=0.09031, over 18446.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01208, ecapa_loss=0.0002573, whisper_loss=0.09624, over 3880447.39 frames. ], batch size: 69, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:20:44,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=550770.0, ans=0.0 2024-08-10 12:20:56,989 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 12:21:05,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=550970.0, ans=0.1 2024-08-10 12:21:08,509 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 3.001e+01 3.473e+01 4.016e+01 7.053e+01, threshold=6.947e+01, percent-clipped=1.0 2024-08-10 12:21:30,980 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 12:21:38,293 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2024-08-10 12:21:39,242 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 12:21:40,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2024-08-10 12:21:45,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=551270.0, ans=0.0 2024-08-10 12:21:46,210 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11650, loss[loss=0.1112, beats_loss=0.01409, ecapa_loss=0.000318, whisper_loss=0.09392, over 14670.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01201, ecapa_loss=0.000259, whisper_loss=0.09623, over 3883794.64 frames. ], batch size: 63, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:21:57,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=551270.0, ans=0.125 2024-08-10 12:22:06,475 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 12:22:06,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=551370.0, ans=0.125 2024-08-10 12:22:18,089 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-10 12:22:39,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=551570.0, ans=0.1 2024-08-10 12:22:39,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=551570.0, ans=0.125 2024-08-10 12:22:55,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=551670.0, ans=0.1 2024-08-10 12:22:57,799 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11700, loss[loss=0.1101, beats_loss=0.0133, ecapa_loss=0.0001974, whisper_loss=0.09484, over 23322.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01208, ecapa_loss=0.0002575, whisper_loss=0.09588, over 3917938.15 frames. ], batch size: 90, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:22:57,981 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 12:23:05,868 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2024-08-10 12:23:07,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=551770.0, ans=0.1 2024-08-10 12:23:11,442 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 12:23:31,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=551970.0, ans=0.0 2024-08-10 12:23:32,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=551970.0, ans=0.1 2024-08-10 12:23:33,561 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 3.233e+01 3.487e+01 4.046e+01 6.995e+01, threshold=6.974e+01, percent-clipped=1.0 2024-08-10 12:23:45,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=552070.0, ans=0.04949747468305833 2024-08-10 12:23:51,874 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2024-08-10 12:23:52,716 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 33 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 12:23:55,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=552070.0, ans=0.125 2024-08-10 12:24:01,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=552170.0, ans=0.125 2024-08-10 12:24:13,352 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11750, loss[loss=0.1093, beats_loss=0.01215, ecapa_loss=0.000251, whisper_loss=0.09463, over 21894.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01211, ecapa_loss=0.0002573, whisper_loss=0.09702, over 3952326.56 frames. ], batch size: 88, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:24:33,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552370.0, ans=0.1 2024-08-10 12:24:46,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=552470.0, ans=0.1 2024-08-10 12:25:00,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552570.0, ans=0.1 2024-08-10 12:25:01,511 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 12:25:01,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=552570.0, ans=0.0 2024-08-10 12:25:07,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=552570.0, ans=15.0 2024-08-10 12:25:09,138 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 12:25:16,443 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 15 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 12:25:29,251 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11800, loss[loss=0.09217, beats_loss=0.01374, ecapa_loss=0.0001852, whisper_loss=0.07657, over 16935.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01209, ecapa_loss=0.0002568, whisper_loss=0.09706, over 3951847.96 frames. ], batch size: 64, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:25:35,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=552770.0, ans=0.125 2024-08-10 12:25:53,814 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 12:25:59,015 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-10 12:26:04,093 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.876e+01 3.467e+01 4.028e+01 7.288e+01, threshold=6.933e+01, percent-clipped=1.0 2024-08-10 12:26:05,991 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 12:26:17,546 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.66 vs. limit=6.0 2024-08-10 12:26:31,273 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 12:26:44,123 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11850, loss[loss=0.1179, beats_loss=0.01243, ecapa_loss=0.0002873, whisper_loss=0.1026, over 20889.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01211, ecapa_loss=0.000256, whisper_loss=0.09676, over 3966105.73 frames. ], batch size: 85, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:27:08,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=553370.0, ans=0.125 2024-08-10 12:27:09,299 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 12:27:09,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=553370.0, ans=0.5 2024-08-10 12:27:24,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=553470.0, ans=0.125 2024-08-10 12:27:30,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=553570.0, ans=10.0 2024-08-10 12:27:34,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=553570.0, ans=0.0 2024-08-10 12:27:35,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=553570.0, ans=0.125 2024-08-10 12:27:43,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553670.0, ans=0.1 2024-08-10 12:27:43,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=553670.0, ans=0.0 2024-08-10 12:27:57,115 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11900, loss[loss=0.1081, beats_loss=0.01146, ecapa_loss=0.0002554, whisper_loss=0.09409, over 17280.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01213, ecapa_loss=0.0002558, whisper_loss=0.09639, over 3971968.81 frames. ], batch size: 70, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:28:03,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=553770.0, ans=0.125 2024-08-10 12:28:08,950 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 17 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-10 12:28:19,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=553870.0, ans=0.1 2024-08-10 12:28:27,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=553970.0, ans=0.125 2024-08-10 12:28:29,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=553970.0, ans=0.1 2024-08-10 12:28:30,725 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 3.028e+01 3.380e+01 3.794e+01 5.730e+01, threshold=6.759e+01, percent-clipped=0.0 2024-08-10 12:28:34,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=553970.0, ans=22.5 2024-08-10 12:28:40,439 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.65 vs. limit=5.0 2024-08-10 12:28:53,142 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 12:29:00,249 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-10 12:29:02,137 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-10 12:29:10,455 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 11950, loss[loss=0.09737, beats_loss=0.0128, ecapa_loss=0.0003264, whisper_loss=0.08131, over 15946.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01208, ecapa_loss=0.0002593, whisper_loss=0.09663, over 3957906.06 frames. ], batch size: 68, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:29:12,157 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 22 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 12:29:18,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=554270.0, ans=0.125 2024-08-10 12:29:23,020 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 12:29:23,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=12.0 2024-08-10 12:29:28,983 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 12:29:32,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=554370.0, ans=0.125 2024-08-10 12:29:37,580 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 12:29:53,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=554570.0, ans=0.0 2024-08-10 12:29:56,368 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-10 12:29:58,334 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-08-10 12:30:00,029 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2024-08-10 12:30:02,092 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 12:30:06,664 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 12:30:10,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=554670.0, ans=0.125 2024-08-10 12:30:12,915 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-10 12:30:22,880 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12000, loss[loss=0.1033, beats_loss=0.01598, ecapa_loss=0.0001867, whisper_loss=0.08544, over 20997.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01199, ecapa_loss=0.000259, whisper_loss=0.09718, over 3939447.48 frames. ], batch size: 84, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:30:22,880 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 12:30:59,960 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on ASR_libri: loss=0.2637, beats_loss=0, ecapa_loss=0.0007919, whisper_loss=0.2558, over 922467.00 frames. 2024-08-10 12:31:09,281 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6179, 4.2004, 3.3407, 3.6377], device='cuda:1') 2024-08-10 12:31:14,911 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.4755, 2.4093, 2.1439, 1.4152], device='cuda:1') 2024-08-10 12:31:15,836 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7711, 2.4488, 2.7628, 1.0518, 1.2054, 2.0163, 2.9399, 2.7000], device='cuda:1') 2024-08-10 12:31:17,022 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on SV_voxceleb1: loss=0.006895, beats_loss=0, ecapa_loss=0.0006895, whisper_loss=0, over 939242.00 frames. 2024-08-10 12:32:45,120 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7077, 2.2529, 1.5120, 1.2738, 1.2522, 1.2233, 1.6518, 1.5552], device='cuda:1') 2024-08-10 12:33:04,194 INFO [train_multi_KD3.py:1149] (1/4) Epoch 4, validation on AT_audioset: loss=0.02758, beats_loss=0.02758, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 12:33:04,198 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 12:33:12,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=554770.0, ans=0.0 2024-08-10 12:33:31,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=554870.0, ans=0.125 2024-08-10 12:33:37,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=554970.0, ans=0.125 2024-08-10 12:33:39,848 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 3.069e+01 3.344e+01 4.078e+01 6.277e+01, threshold=6.688e+01, percent-clipped=0.0 2024-08-10 12:33:43,375 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 12:33:47,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=554970.0, ans=0.125 2024-08-10 12:34:05,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555070.0, ans=0.1 2024-08-10 12:34:07,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.50 vs. limit=6.0 2024-08-10 12:34:20,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=555170.0, ans=0.2 2024-08-10 12:34:24,387 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12050, loss[loss=0.1403, beats_loss=0.009482, ecapa_loss=0.0002968, whisper_loss=0.1279, over 17597.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01199, ecapa_loss=0.0002595, whisper_loss=0.09673, over 3888188.09 frames. ], batch size: 69, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:34:35,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=555270.0, ans=0.125 2024-08-10 12:34:58,907 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 32 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 12:35:05,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=555470.0, ans=0.1 2024-08-10 12:35:22,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-08-10 12:35:33,489 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2024-08-10 12:35:33,639 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.82 vs. limit=15.0 2024-08-10 12:35:46,372 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 27 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-10 12:35:46,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=555670.0, ans=0.0 2024-08-10 12:35:49,823 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12100, loss[loss=0.1015, beats_loss=0.01198, ecapa_loss=0.0002467, whisper_loss=0.08708, over 19789.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01199, ecapa_loss=0.0002613, whisper_loss=0.09731, over 3855817.64 frames. ], batch size: 77, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:35:51,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=555770.0, ans=0.0 2024-08-10 12:35:55,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=555770.0, ans=0.0 2024-08-10 12:36:00,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555770.0, ans=0.1 2024-08-10 12:36:28,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=555970.0, ans=0.125 2024-08-10 12:36:30,093 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.425e+01 2.918e+01 3.194e+01 3.735e+01 7.690e+01, threshold=6.389e+01, percent-clipped=2.0 2024-08-10 12:36:37,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=555970.0, ans=0.2 2024-08-10 12:36:40,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.35 vs. limit=22.5 2024-08-10 12:36:42,179 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.67 vs. limit=22.5 2024-08-10 12:36:46,579 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 12:37:08,845 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-10 12:37:15,246 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12150, loss[loss=0.1104, beats_loss=0.01412, ecapa_loss=0.0002065, whisper_loss=0.09424, over 19585.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01201, ecapa_loss=0.0002627, whisper_loss=0.09655, over 3853976.78 frames. ], batch size: 75, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:37:18,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=556270.0, ans=0.125 2024-08-10 12:37:30,960 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 12:37:33,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=556370.0, ans=0.0 2024-08-10 12:37:41,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=556370.0, ans=0.035 2024-08-10 12:38:17,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=556670.0, ans=0.0 2024-08-10 12:38:27,719 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 12:38:34,070 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12200, loss[loss=0.09547, beats_loss=0.01135, ecapa_loss=0.0002536, whisper_loss=0.08158, over 16920.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01199, ecapa_loss=0.0002622, whisper_loss=0.09682, over 3872978.90 frames. ], batch size: 66, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:38:45,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=556770.0, ans=0.125 2024-08-10 12:38:46,749 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2024-08-10 12:38:52,626 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 31 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 12:38:59,377 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.77 vs. limit=6.0 2024-08-10 12:39:03,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=556870.0, ans=0.125 2024-08-10 12:39:04,859 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 12:39:05,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=556970.0, ans=0.125 2024-08-10 12:39:10,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.794e+01 3.177e+01 3.515e+01 6.137e+01, threshold=6.354e+01, percent-clipped=0.0 2024-08-10 12:39:17,094 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-10 12:39:17,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=556970.0, ans=0.125 2024-08-10 12:39:24,067 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 12:39:25,897 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-10 12:39:33,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=557070.0, ans=0.125 2024-08-10 12:39:48,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=557170.0, ans=0.1 2024-08-10 12:39:50,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=557170.0, ans=0.1 2024-08-10 12:39:54,917 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12250, loss[loss=0.119, beats_loss=0.0106, ecapa_loss=0.0002289, whisper_loss=0.1061, over 15825.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01192, ecapa_loss=0.0002607, whisper_loss=0.09739, over 3891912.05 frames. ], batch size: 60, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:40:34,803 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-10 12:40:46,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=557570.0, ans=0.125 2024-08-10 12:40:51,518 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-10 12:40:53,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=557570.0, ans=0.125 2024-08-10 12:40:56,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=557570.0, ans=0.125 2024-08-10 12:41:01,474 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-10 12:41:14,495 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12300, loss[loss=0.111, beats_loss=0.01133, ecapa_loss=0.0003027, whisper_loss=0.09665, over 22082.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01195, ecapa_loss=0.0002593, whisper_loss=0.09709, over 3884920.61 frames. ], batch size: 89, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:41:20,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=557770.0, ans=0.125 2024-08-10 12:41:20,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=557770.0, ans=0.125 2024-08-10 12:41:23,702 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 12:41:33,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=557870.0, ans=0.0 2024-08-10 12:41:34,890 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 16 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 12:41:49,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 3.037e+01 3.524e+01 3.995e+01 1.053e+02, threshold=7.048e+01, percent-clipped=4.0 2024-08-10 12:41:53,341 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 12:41:59,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.36 vs. limit=15.0 2024-08-10 12:42:03,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=558070.0, ans=0.0 2024-08-10 12:42:12,667 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 12:42:15,680 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 12:42:28,945 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.68 vs. limit=6.0 2024-08-10 12:42:31,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=558270.0, ans=0.125 2024-08-10 12:42:33,005 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12350, loss[loss=0.1178, beats_loss=0.01079, ecapa_loss=0.0002352, whisper_loss=0.1046, over 20925.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01197, ecapa_loss=0.0002592, whisper_loss=0.09706, over 3883471.19 frames. ], batch size: 81, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:42:33,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=558270.0, ans=0.125 2024-08-10 12:42:34,362 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 12:42:41,704 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.76 vs. limit=22.5 2024-08-10 12:42:45,705 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-10 12:42:57,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=558370.0, ans=0.125 2024-08-10 12:42:59,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=558370.0, ans=0.0 2024-08-10 12:42:59,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=558370.0, ans=0.125 2024-08-10 12:43:03,362 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=12.0 2024-08-10 12:43:18,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=558470.0, ans=0.0 2024-08-10 12:43:20,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=558470.0, ans=0.2 2024-08-10 12:43:20,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=558470.0, ans=0.125 2024-08-10 12:43:28,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=558570.0, ans=0.125 2024-08-10 12:43:28,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=558570.0, ans=0.2 2024-08-10 12:43:42,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=558670.0, ans=0.2 2024-08-10 12:43:46,390 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 12:43:49,645 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=15.0 2024-08-10 12:44:01,230 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12400, loss[loss=0.1279, beats_loss=0.01068, ecapa_loss=0.0002495, whisper_loss=0.1147, over 23004.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01196, ecapa_loss=0.0002578, whisper_loss=0.09759, over 3886939.37 frames. ], batch size: 90, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:44:35,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=558970.0, ans=0.125 2024-08-10 12:44:37,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=558970.0, ans=0.125 2024-08-10 12:44:41,293 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.193e+01 2.951e+01 3.310e+01 3.895e+01 5.650e+01, threshold=6.619e+01, percent-clipped=0.0 2024-08-10 12:44:58,103 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 12:45:05,728 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 12:45:13,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=559170.0, ans=0.125 2024-08-10 12:45:26,584 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12450, loss[loss=0.1122, beats_loss=0.01405, ecapa_loss=0.0002564, whisper_loss=0.09562, over 22672.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01192, ecapa_loss=0.000258, whisper_loss=0.0971, over 3876051.40 frames. ], batch size: 93, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:45:28,152 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 12:45:32,740 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 12:45:42,926 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 12:45:55,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=559370.0, ans=0.125 2024-08-10 12:46:16,707 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 12:46:18,975 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=15.0 2024-08-10 12:46:27,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=559670.0, ans=0.0 2024-08-10 12:46:45,703 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12500, loss[loss=0.1204, beats_loss=0.009164, ecapa_loss=0.0002972, whisper_loss=0.1083, over 15641.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01191, ecapa_loss=0.0002602, whisper_loss=0.09736, over 3870465.30 frames. ], batch size: 59, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:46:49,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=559770.0, ans=0.125 2024-08-10 12:46:54,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=559770.0, ans=0.0 2024-08-10 12:47:02,762 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 12:47:04,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=559870.0, ans=0.125 2024-08-10 12:47:06,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=559870.0, ans=12.0 2024-08-10 12:47:07,293 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 12:47:23,615 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2024-08-10 12:47:28,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.429e+01 3.210e+01 3.616e+01 4.037e+01 8.521e+01, threshold=7.231e+01, percent-clipped=2.0 2024-08-10 12:47:56,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=560170.0, ans=0.1 2024-08-10 12:48:12,595 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12550, loss[loss=0.1093, beats_loss=0.01239, ecapa_loss=0.00028, whisper_loss=0.09413, over 21249.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01196, ecapa_loss=0.0002596, whisper_loss=0.09742, over 3885544.11 frames. ], batch size: 88, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:48:18,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2024-08-10 12:48:21,564 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2024-08-10 12:48:46,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=560470.0, ans=0.0 2024-08-10 12:48:51,708 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=15.0 2024-08-10 12:49:01,730 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 12:49:17,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=560670.0, ans=0.0 2024-08-10 12:49:19,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560670.0, ans=0.1 2024-08-10 12:49:21,600 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-10 12:49:21,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=560670.0, ans=0.125 2024-08-10 12:49:24,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=560670.0, ans=0.125 2024-08-10 12:49:26,830 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=15.0 2024-08-10 12:49:29,704 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12600, loss[loss=0.113, beats_loss=0.01008, ecapa_loss=0.0003311, whisper_loss=0.09959, over 20652.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01192, ecapa_loss=0.00026, whisper_loss=0.09746, over 3889914.05 frames. ], batch size: 88, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:50:03,219 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 12:50:03,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=560970.0, ans=0.125 2024-08-10 12:50:06,299 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.385e+01 3.110e+01 3.572e+01 4.096e+01 7.155e+01, threshold=7.143e+01, percent-clipped=0.0 2024-08-10 12:50:16,906 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-10 12:50:32,134 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-10 12:50:46,751 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12650, loss[loss=0.1154, beats_loss=0.01098, ecapa_loss=0.0002884, whisper_loss=0.1015, over 21209.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01201, ecapa_loss=0.0002601, whisper_loss=0.09726, over 3887289.21 frames. ], batch size: 84, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:51:03,254 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 12:51:04,516 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 12:51:05,223 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=12.0 2024-08-10 12:51:20,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2024-08-10 12:51:32,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=561470.0, ans=10.0 2024-08-10 12:51:34,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=561570.0, ans=0.125 2024-08-10 12:51:40,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=561570.0, ans=0.125 2024-08-10 12:51:45,632 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2024-08-10 12:51:52,164 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 12:52:08,725 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12700, loss[loss=0.09092, beats_loss=0.01264, ecapa_loss=0.000255, whisper_loss=0.07573, over 17809.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01202, ecapa_loss=0.0002589, whisper_loss=0.09744, over 3900192.60 frames. ], batch size: 73, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:52:17,086 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 12:52:24,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=561870.0, ans=0.125 2024-08-10 12:52:30,802 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2024-08-10 12:52:33,032 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-10 12:52:35,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=561870.0, ans=0.2 2024-08-10 12:52:38,662 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2024-08-10 12:52:44,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.862e+01 3.101e+01 3.673e+01 6.463e+01, threshold=6.201e+01, percent-clipped=0.0 2024-08-10 12:52:45,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=561970.0, ans=0.125 2024-08-10 12:53:14,924 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=15.0 2024-08-10 12:53:26,708 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12750, loss[loss=0.1173, beats_loss=0.01121, ecapa_loss=0.0002583, whisper_loss=0.1035, over 21363.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01211, ecapa_loss=0.0002573, whisper_loss=0.09693, over 3911922.55 frames. ], batch size: 83, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:53:29,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=562270.0, ans=0.1 2024-08-10 12:53:29,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=562270.0, ans=0.0 2024-08-10 12:53:41,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=562370.0, ans=0.0 2024-08-10 12:53:44,583 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 12:53:44,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=562370.0, ans=0.125 2024-08-10 12:53:50,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=562370.0, ans=0.1 2024-08-10 12:53:53,636 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 12:53:54,755 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 12:53:59,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=562470.0, ans=0.0 2024-08-10 12:54:09,847 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 12:54:25,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=562670.0, ans=0.125 2024-08-10 12:54:31,857 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.91 vs. limit=15.0 2024-08-10 12:54:34,205 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 12:54:42,204 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12800, loss[loss=0.1086, beats_loss=0.01164, ecapa_loss=0.0002909, whisper_loss=0.09402, over 21953.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01224, ecapa_loss=0.0002592, whisper_loss=0.09596, over 3902003.35 frames. ], batch size: 91, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:54:48,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=562770.0, ans=0.0 2024-08-10 12:55:17,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.950e+01 3.581e+01 4.034e+01 6.155e+01, threshold=7.162e+01, percent-clipped=0.0 2024-08-10 12:55:21,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=562970.0, ans=0.1 2024-08-10 12:55:46,637 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 12:55:52,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=563170.0, ans=22.5 2024-08-10 12:55:55,730 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12850, loss[loss=0.1012, beats_loss=0.01333, ecapa_loss=0.0002154, whisper_loss=0.08576, over 21017.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01226, ecapa_loss=0.0002583, whisper_loss=0.09501, over 3855182.70 frames. ], batch size: 84, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:56:02,325 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 12:56:18,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=563370.0, ans=0.2 2024-08-10 12:56:37,852 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 12:56:45,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563570.0, ans=0.1 2024-08-10 12:56:47,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=563570.0, ans=0.125 2024-08-10 12:56:47,646 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2024-08-10 12:56:48,350 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 11 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-10 12:57:00,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=563670.0, ans=0.125 2024-08-10 12:57:05,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=563770.0, ans=0.0 2024-08-10 12:57:05,801 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12900, loss[loss=0.1331, beats_loss=0.01189, ecapa_loss=0.0001776, whisper_loss=0.1194, over 15677.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01229, ecapa_loss=0.0002577, whisper_loss=0.09446, over 3862630.99 frames. ], batch size: 55, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:57:10,478 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.41 vs. limit=12.0 2024-08-10 12:57:12,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=563770.0, ans=0.035 2024-08-10 12:57:15,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=563770.0, ans=0.125 2024-08-10 12:57:28,389 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 12:57:31,601 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-10 12:57:37,291 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 12:57:37,944 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.92 vs. limit=22.5 2024-08-10 12:57:38,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.781e+01 3.198e+01 3.852e+01 6.418e+01, threshold=6.396e+01, percent-clipped=0.0 2024-08-10 12:57:49,312 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 12:58:02,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=564170.0, ans=0.025 2024-08-10 12:58:04,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=564170.0, ans=0.125 2024-08-10 12:58:09,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=564170.0, ans=0.125 2024-08-10 12:58:15,918 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 12950, loss[loss=0.144, beats_loss=0.007345, ecapa_loss=0.0003743, whisper_loss=0.1329, over 16790.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01202, ecapa_loss=0.0002611, whisper_loss=0.09502, over 3845079.54 frames. ], batch size: 73, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:58:17,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564270.0, ans=0.1 2024-08-10 12:58:34,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=564370.0, ans=0.2 2024-08-10 12:58:46,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=564470.0, ans=0.2 2024-08-10 12:58:54,246 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-10 12:58:59,541 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 12:59:03,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=564570.0, ans=0.125 2024-08-10 12:59:06,448 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.172e-01 2024-08-10 12:59:23,847 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13000, loss[loss=0.1426, beats_loss=0.009352, ecapa_loss=0.0003409, whisper_loss=0.1298, over 13001.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01194, ecapa_loss=0.0002613, whisper_loss=0.09613, over 3877158.34 frames. ], batch size: 53, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:59:29,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=564770.0, ans=0.0 2024-08-10 12:59:36,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=564870.0, ans=0.0 2024-08-10 12:59:50,883 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 12:59:51,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=564970.0, ans=0.0 2024-08-10 12:59:54,716 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.945e+01 3.366e+01 4.220e+01 5.870e+01, threshold=6.733e+01, percent-clipped=0.0 2024-08-10 13:00:17,690 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=15.0 2024-08-10 13:00:21,341 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 13:00:32,708 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13050, loss[loss=0.09774, beats_loss=0.01247, ecapa_loss=0.0002558, whisper_loss=0.08272, over 14189.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01194, ecapa_loss=0.0002607, whisper_loss=0.09617, over 3843519.05 frames. ], batch size: 58, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:00:33,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=565270.0, ans=0.125 2024-08-10 13:01:08,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=565470.0, ans=0.125 2024-08-10 13:01:14,090 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 13:01:18,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=565570.0, ans=0.125 2024-08-10 13:01:42,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=565670.0, ans=0.09899494936611666 2024-08-10 13:01:48,437 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13100, loss[loss=0.1099, beats_loss=0.0126, ecapa_loss=0.0002427, whisper_loss=0.09485, over 21123.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01197, ecapa_loss=0.0002579, whisper_loss=0.09648, over 3854268.45 frames. ], batch size: 86, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:01:50,497 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 13:01:51,969 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 13:01:53,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=565770.0, ans=0.125 2024-08-10 13:02:02,110 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 13:02:05,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=565870.0, ans=0.125 2024-08-10 13:02:15,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=565870.0, ans=0.125 2024-08-10 13:02:26,202 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 3.014e+01 3.400e+01 3.856e+01 6.675e+01, threshold=6.801e+01, percent-clipped=0.0 2024-08-10 13:02:27,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=565970.0, ans=0.1 2024-08-10 13:02:32,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=565970.0, ans=0.125 2024-08-10 13:02:34,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=565970.0, ans=0.125 2024-08-10 13:02:36,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=566070.0, ans=0.0 2024-08-10 13:02:39,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=566070.0, ans=0.125 2024-08-10 13:02:42,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=566070.0, ans=0.125 2024-08-10 13:03:09,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=566270.0, ans=0.2 2024-08-10 13:03:10,270 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13150, loss[loss=0.09915, beats_loss=0.01062, ecapa_loss=0.0002963, whisper_loss=0.08557, over 17529.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01198, ecapa_loss=0.000257, whisper_loss=0.09561, over 3830038.21 frames. ], batch size: 71, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:03:12,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=566270.0, ans=0.125 2024-08-10 13:03:27,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=566370.0, ans=0.0 2024-08-10 13:03:42,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=566470.0, ans=0.125 2024-08-10 13:03:47,182 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=12.0 2024-08-10 13:04:03,330 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 10 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 13:04:27,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=566670.0, ans=10.0 2024-08-10 13:04:31,685 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13200, loss[loss=0.1361, beats_loss=0.01037, ecapa_loss=0.0002365, whisper_loss=0.1234, over 23454.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01205, ecapa_loss=0.000257, whisper_loss=0.09511, over 3820508.37 frames. ], batch size: 89, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:04:39,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=566770.0, ans=0.1 2024-08-10 13:04:48,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=566870.0, ans=0.04949747468305833 2024-08-10 13:05:07,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.201e+01 2.868e+01 3.486e+01 3.850e+01 5.808e+01, threshold=6.972e+01, percent-clipped=0.0 2024-08-10 13:05:21,150 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 13:05:21,588 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-10 13:05:24,632 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-10 13:05:34,086 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.25 vs. limit=15.0 2024-08-10 13:05:35,457 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 13:05:50,049 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13250, loss[loss=0.107, beats_loss=0.01346, ecapa_loss=0.000223, whisper_loss=0.09134, over 16636.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01192, ecapa_loss=0.0002608, whisper_loss=0.09605, over 3826164.24 frames. ], batch size: 64, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:06:24,029 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 13:06:42,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=22.5 2024-08-10 13:07:00,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=567670.0, ans=0.5 2024-08-10 13:07:11,895 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13300, loss[loss=0.1147, beats_loss=0.01042, ecapa_loss=0.0002879, whisper_loss=0.1014, over 22415.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01201, ecapa_loss=0.00026, whisper_loss=0.0953, over 3826982.73 frames. ], batch size: 91, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:07:15,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=567770.0, ans=0.1 2024-08-10 13:07:29,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=567870.0, ans=0.125 2024-08-10 13:07:29,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=567870.0, ans=0.125 2024-08-10 13:07:34,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=567870.0, ans=0.125 2024-08-10 13:07:48,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 3.028e+01 3.547e+01 3.970e+01 7.425e+01, threshold=7.095e+01, percent-clipped=1.0 2024-08-10 13:07:55,760 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 13:08:14,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=568170.0, ans=0.1 2024-08-10 13:08:21,114 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 13:08:30,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=568270.0, ans=0.125 2024-08-10 13:08:31,742 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13350, loss[loss=0.1017, beats_loss=0.01384, ecapa_loss=0.0002257, whisper_loss=0.08558, over 18094.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01197, ecapa_loss=0.0002617, whisper_loss=0.09597, over 3859835.76 frames. ], batch size: 72, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:08:33,723 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 13:08:37,020 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 13:08:46,256 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 13:08:51,214 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 13:08:57,133 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 13:09:01,385 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 13:09:09,091 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 13:09:13,530 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 42 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 13:09:31,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=568670.0, ans=0.0 2024-08-10 13:09:46,860 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13400, loss[loss=0.1044, beats_loss=0.01064, ecapa_loss=0.0002978, whisper_loss=0.09075, over 22183.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01194, ecapa_loss=0.0002617, whisper_loss=0.09665, over 3881340.00 frames. ], batch size: 90, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:09:48,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=568770.0, ans=0.0 2024-08-10 13:09:53,840 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.42 vs. limit=22.5 2024-08-10 13:10:02,792 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2024-08-10 13:10:03,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=568870.0, ans=0.125 2024-08-10 13:10:09,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568870.0, ans=0.1 2024-08-10 13:10:10,878 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-10 13:10:17,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=568970.0, ans=0.0 2024-08-10 13:10:18,357 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 2.910e+01 3.380e+01 3.958e+01 6.126e+01, threshold=6.760e+01, percent-clipped=0.0 2024-08-10 13:10:32,148 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.466e-01 2024-08-10 13:10:37,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=569070.0, ans=0.125 2024-08-10 13:10:55,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569270.0, ans=0.1 2024-08-10 13:10:56,408 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13450, loss[loss=0.1214, beats_loss=0.01061, ecapa_loss=0.0002343, whisper_loss=0.1084, over 18816.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01199, ecapa_loss=0.000261, whisper_loss=0.09656, over 3892797.07 frames. ], batch size: 70, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:11:11,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=569370.0, ans=0.125 2024-08-10 13:11:56,678 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.177e+05 2024-08-10 13:12:04,058 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13500, loss[loss=0.09027, beats_loss=0.01442, ecapa_loss=0.0002804, whisper_loss=0.07305, over 18717.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01197, ecapa_loss=0.000261, whisper_loss=0.09692, over 3907738.81 frames. ], batch size: 77, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:12:04,982 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.10 vs. limit=15.0 2024-08-10 13:12:06,247 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.04 vs. limit=10.0 2024-08-10 13:12:10,054 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 20 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 13:12:10,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=569770.0, ans=0.0 2024-08-10 13:12:12,695 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 13:12:14,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.47 vs. limit=15.0 2024-08-10 13:12:17,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.02 vs. limit=10.0 2024-08-10 13:12:20,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=569870.0, ans=0.2 2024-08-10 13:12:23,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=569870.0, ans=0.0 2024-08-10 13:12:25,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=569870.0, ans=0.125 2024-08-10 13:12:33,329 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 27 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-10 13:12:33,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=569970.0, ans=0.125 2024-08-10 13:12:35,755 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.211e+01 2.996e+01 3.434e+01 4.154e+01 6.721e+01, threshold=6.868e+01, percent-clipped=0.0 2024-08-10 13:12:37,205 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 17 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 13:12:52,603 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2024-08-10 13:13:03,991 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2024-08-10 13:13:11,481 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13550, loss[loss=0.1205, beats_loss=0.009286, ecapa_loss=0.0002588, whisper_loss=0.1087, over 19004.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01194, ecapa_loss=0.0002597, whisper_loss=0.09597, over 3883549.36 frames. ], batch size: 75, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:13:17,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=570270.0, ans=0.0 2024-08-10 13:13:22,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=570270.0, ans=0.0 2024-08-10 13:13:41,088 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 13:13:50,042 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 13:13:56,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=570570.0, ans=0.0 2024-08-10 13:13:58,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570570.0, ans=0.1 2024-08-10 13:14:08,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=570670.0, ans=0.125 2024-08-10 13:14:16,938 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13600, loss[loss=0.1096, beats_loss=0.01073, ecapa_loss=0.0002974, whisper_loss=0.09587, over 14337.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01194, ecapa_loss=0.000257, whisper_loss=0.09547, over 3852605.03 frames. ], batch size: 58, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:14:38,072 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 36 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 13:14:47,616 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 2.840e+01 3.248e+01 3.798e+01 4.801e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-10 13:14:53,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=570970.0, ans=0.125 2024-08-10 13:14:56,909 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 13:15:05,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=571070.0, ans=0.2 2024-08-10 13:15:05,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=571070.0, ans=0.025 2024-08-10 13:15:12,898 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 13:15:13,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=571170.0, ans=0.1 2024-08-10 13:15:22,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13650, loss[loss=0.1011, beats_loss=0.009671, ecapa_loss=0.0003136, whisper_loss=0.08834, over 15253.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01199, ecapa_loss=0.000258, whisper_loss=0.0951, over 3841422.02 frames. ], batch size: 63, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:15:33,716 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2024-08-10 13:15:45,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=571370.0, ans=0.1 2024-08-10 13:15:54,907 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-08-10 13:16:12,861 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-10 13:16:22,363 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 17 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 13:16:30,533 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13700, loss[loss=0.1196, beats_loss=0.01137, ecapa_loss=0.0002538, whisper_loss=0.1057, over 21948.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01201, ecapa_loss=0.0002573, whisper_loss=0.09524, over 3839848.93 frames. ], batch size: 90, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:17:01,537 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.889e+01 3.237e+01 4.000e+01 5.503e+01, threshold=6.474e+01, percent-clipped=0.0 2024-08-10 13:17:16,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=12.0 2024-08-10 13:17:19,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=572070.0, ans=0.125 2024-08-10 13:17:27,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=572170.0, ans=0.0 2024-08-10 13:17:29,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=12.0 2024-08-10 13:17:38,000 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13750, loss[loss=0.1119, beats_loss=0.01293, ecapa_loss=0.0002042, whisper_loss=0.09688, over 16801.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01202, ecapa_loss=0.000257, whisper_loss=0.09559, over 3843240.51 frames. ], batch size: 65, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:17:40,988 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 13:17:46,040 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 13:17:50,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=572370.0, ans=0.2 2024-08-10 13:18:16,416 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 31 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 13:18:26,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=572570.0, ans=0.2 2024-08-10 13:18:30,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=572570.0, ans=0.125 2024-08-10 13:18:31,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=572670.0, ans=0.125 2024-08-10 13:18:35,597 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:18:46,270 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13800, loss[loss=0.1019, beats_loss=0.01253, ecapa_loss=0.0003264, whisper_loss=0.08612, over 21137.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01206, ecapa_loss=0.0002559, whisper_loss=0.09557, over 3870836.12 frames. ], batch size: 89, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:18:49,349 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 13:18:50,717 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-10 13:19:04,751 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 13:19:07,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=572870.0, ans=0.0 2024-08-10 13:19:18,111 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.346e+01 2.990e+01 3.419e+01 4.092e+01 5.899e+01, threshold=6.838e+01, percent-clipped=0.0 2024-08-10 13:19:28,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=573070.0, ans=0.1 2024-08-10 13:19:29,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=573070.0, ans=0.125 2024-08-10 13:19:51,142 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 13:19:51,422 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:19:54,813 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13850, loss[loss=0.1307, beats_loss=0.01049, ecapa_loss=0.0002318, whisper_loss=0.1179, over 23486.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01204, ecapa_loss=0.0002552, whisper_loss=0.09589, over 3914734.25 frames. ], batch size: 91, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:20:20,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=573370.0, ans=0.125 2024-08-10 13:20:21,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2024-08-10 13:20:23,029 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=15.0 2024-08-10 13:20:37,996 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-10 13:20:51,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=573670.0, ans=0.125 2024-08-10 13:21:03,884 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13900, loss[loss=0.09035, beats_loss=0.01421, ecapa_loss=0.0002523, whisper_loss=0.07362, over 21456.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01201, ecapa_loss=0.0002543, whisper_loss=0.09675, over 3936461.00 frames. ], batch size: 91, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:21:17,424 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 23 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 13:21:35,187 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 3.046e+01 3.391e+01 3.778e+01 5.936e+01, threshold=6.783e+01, percent-clipped=0.0 2024-08-10 13:21:46,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=574070.0, ans=0.015 2024-08-10 13:22:03,014 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=15.0 2024-08-10 13:22:05,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=574170.0, ans=0.125 2024-08-10 13:22:13,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 13950, loss[loss=0.117, beats_loss=0.01045, ecapa_loss=0.0002311, whisper_loss=0.1043, over 22452.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01198, ecapa_loss=0.0002535, whisper_loss=0.09664, over 3905635.99 frames. ], batch size: 90, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:22:24,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=574270.0, ans=0.2 2024-08-10 13:22:26,901 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 13:22:35,578 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 13:22:37,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=574370.0, ans=0.0 2024-08-10 13:22:41,501 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2024-08-10 13:22:52,852 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2024-08-10 13:23:02,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574570.0, ans=0.1 2024-08-10 13:23:11,686 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 13:23:12,990 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 13:23:14,363 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 13:23:17,338 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 13:23:22,567 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 14000, loss[loss=0.1048, beats_loss=0.01142, ecapa_loss=0.0002778, whisper_loss=0.09057, over 16261.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01201, ecapa_loss=0.0002514, whisper_loss=0.09614, over 3907108.73 frames. ], batch size: 67, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:23:26,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=574770.0, ans=0.0 2024-08-10 13:23:41,388 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2024-08-10 13:23:55,222 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.914e+01 3.235e+01 3.866e+01 6.339e+01, threshold=6.469e+01, percent-clipped=0.0 2024-08-10 13:23:56,185 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.71 vs. limit=10.0 2024-08-10 13:24:07,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=575070.0, ans=0.125 2024-08-10 13:24:10,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=575070.0, ans=0.0 2024-08-10 13:24:34,142 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 14050, loss[loss=0.1163, beats_loss=0.01278, ecapa_loss=0.0002602, whisper_loss=0.101, over 23269.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01205, ecapa_loss=0.0002526, whisper_loss=0.09571, over 3886538.53 frames. ], batch size: 92, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:24:40,972 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 11 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 13:24:50,041 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 13:24:50,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=575370.0, ans=0.2 2024-08-10 13:24:57,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=575370.0, ans=0.2 2024-08-10 13:25:11,272 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 13:25:14,572 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=12.0 2024-08-10 13:25:19,638 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=12.0 2024-08-10 13:25:33,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=575670.0, ans=0.0 2024-08-10 13:25:47,594 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 14100, loss[loss=0.09244, beats_loss=0.01362, ecapa_loss=0.0002452, whisper_loss=0.07637, over 19595.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01203, ecapa_loss=0.00025, whisper_loss=0.09615, over 3890215.61 frames. ], batch size: 80, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:25:49,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=575770.0, ans=0.125 2024-08-10 13:25:52,351 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-10 13:25:54,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=575770.0, ans=0.125 2024-08-10 13:26:13,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=575870.0, ans=0.125 2024-08-10 13:26:25,062 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.359e+01 3.001e+01 3.648e+01 4.223e+01 8.641e+01, threshold=7.295e+01, percent-clipped=2.0 2024-08-10 13:26:25,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=575970.0, ans=0.0 2024-08-10 13:26:33,307 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-10 13:26:36,927 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 13:26:37,497 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=12.0 2024-08-10 13:26:57,190 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.03 vs. limit=22.5 2024-08-10 13:26:58,936 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 13:27:03,699 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 13:27:07,054 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-10 13:27:08,015 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 14150, loss[loss=0.1364, beats_loss=0.01006, ecapa_loss=0.0003591, whisper_loss=0.1227, over 14414.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01198, ecapa_loss=0.0002528, whisper_loss=0.09655, over 3902111.23 frames. ], batch size: 58, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:27:28,064 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2024-08-10 13:27:35,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2024-08-10 13:27:35,178 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.96 vs. limit=10.0 2024-08-10 13:28:27,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=576670.0, ans=0.1 2024-08-10 13:28:32,922 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 14200, loss[loss=0.1024, beats_loss=0.01123, ecapa_loss=0.0002106, whisper_loss=0.08909, over 17847.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01203, ecapa_loss=0.0002513, whisper_loss=0.09636, over 3921932.08 frames. ], batch size: 66, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:28:47,969 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 13:29:17,152 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.473e+01 3.094e+01 3.432e+01 3.863e+01 7.530e+01, threshold=6.863e+01, percent-clipped=1.0 2024-08-10 13:29:53,932 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 13:30:08,994 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 14250, loss[loss=0.1004, beats_loss=0.01336, ecapa_loss=0.0001893, whisper_loss=0.0851, over 15025.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.0121, ecapa_loss=0.0002488, whisper_loss=0.09552, over 3893100.93 frames. ], batch size: 58, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:30:18,973 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:30:33,363 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 39 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-10 13:30:49,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=577470.0, ans=0.0 2024-08-10 13:31:16,508 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.88 vs. limit=15.0 2024-08-10 13:31:29,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=577570.0, ans=0.125 2024-08-10 13:31:56,008 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 14300, loss[loss=0.09446, beats_loss=0.01237, ecapa_loss=0.0002514, whisper_loss=0.07957, over 21976.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01208, ecapa_loss=0.0002495, whisper_loss=0.09572, over 3919846.64 frames. ], batch size: 88, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:32:01,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=577770.0, ans=0.125 2024-08-10 13:32:02,728 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 13:32:20,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=577870.0, ans=0.09899494936611666 2024-08-10 13:32:22,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=577870.0, ans=0.125 2024-08-10 13:32:32,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=577870.0, ans=0.125 2024-08-10 13:32:44,825 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.908e+01 3.226e+01 3.811e+01 6.354e+01, threshold=6.452e+01, percent-clipped=0.0 2024-08-10 13:32:46,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=577970.0, ans=0.125 2024-08-10 13:32:53,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=577970.0, ans=0.05 2024-08-10 13:32:59,856 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 13:33:42,542 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 14350, loss[loss=0.09907, beats_loss=0.01451, ecapa_loss=0.0002672, whisper_loss=0.08188, over 21168.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.0121, ecapa_loss=0.0002497, whisper_loss=0.09589, over 3885801.93 frames. ], batch size: 92, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:33:48,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=578270.0, ans=0.125 2024-08-10 13:33:50,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=578270.0, ans=0.125 2024-08-10 13:33:55,017 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 13:33:55,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=578270.0, ans=0.0 2024-08-10 13:34:04,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2024-08-10 13:34:12,629 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 13:34:19,758 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 13:34:25,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=578570.0, ans=0.1 2024-08-10 13:34:26,374 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 13:34:39,160 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 13:34:44,620 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 13:34:52,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 14400, loss[loss=0.09566, beats_loss=0.01354, ecapa_loss=0.0002755, whisper_loss=0.07936, over 21664.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01203, ecapa_loss=0.0002531, whisper_loss=0.0968, over 3907003.20 frames. ], batch size: 90, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:34:54,356 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 13:35:00,052 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 13:35:00,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=578770.0, ans=0.0 2024-08-10 13:35:02,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=578770.0, ans=0.0 2024-08-10 13:35:02,694 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:35:03,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=578770.0, ans=0.0 2024-08-10 13:35:18,273 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 13:35:24,959 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 3.244e+01 3.522e+01 4.448e+01 1.287e+02, threshold=7.043e+01, percent-clipped=5.0 2024-08-10 13:35:30,351 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 16 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 13:35:30,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=578970.0, ans=0.0 2024-08-10 13:35:44,776 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-10 13:35:52,196 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.96 vs. limit=22.5 2024-08-10 13:36:02,147 INFO [train_multi_KD3.py:1116] (1/4) Epoch 4, batch 14450, loss[loss=0.1251, beats_loss=0.01259, ecapa_loss=0.000255, whisper_loss=0.1099, over 23351.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.0121, ecapa_loss=0.0002541, whisper_loss=0.09649, over 3904759.64 frames. ], batch size: 89, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:36:06,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=579270.0, ans=0.125 2024-08-10 13:36:22,996 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 13:36:34,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=579470.0, ans=0.0 2024-08-10 13:36:34,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=579470.0, ans=0.1 2024-08-10 13:36:40,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=579470.0, ans=0.0 2024-08-10 13:36:45,747 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 13:37:44,847 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 0, loss[loss=0.1009, beats_loss=0.0115, ecapa_loss=0.0002758, whisper_loss=0.08667, over 18485.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0115, ecapa_loss=0.0002758, whisper_loss=0.08667, over 18485.00 frames. ], batch size: 74, lr: 1.31e-02, grad_scale: 8589934592.0 2024-08-10 13:37:44,847 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 13:38:27,671 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on ASR_libri: loss=0.2622, beats_loss=0, ecapa_loss=0.0007699, whisper_loss=0.2545, over 922467.00 frames. 2024-08-10 13:38:42,810 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on SV_voxceleb1: loss=0.006763, beats_loss=0, ecapa_loss=0.0006763, whisper_loss=0, over 939242.00 frames. 2024-08-10 13:40:39,866 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on AT_audioset: loss=0.02719, beats_loss=0.02719, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 13:40:39,869 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 13:40:56,029 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 13:41:05,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=579820.0, ans=0.2 2024-08-10 13:41:22,701 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 13:41:25,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=579820.0, ans=0.125 2024-08-10 13:41:40,550 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 13:41:53,184 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 3.049e+01 3.546e+01 4.164e+01 6.478e+01, threshold=7.092e+01, percent-clipped=0.0 2024-08-10 13:41:56,968 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-10 13:42:46,419 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 50, loss[loss=0.1216, beats_loss=0.01026, ecapa_loss=0.0002651, whisper_loss=0.1087, over 14995.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01125, ecapa_loss=0.0002674, whisper_loss=0.09763, over 874964.89 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:42:46,644 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 30 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 13:43:00,874 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 13:43:34,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=580420.0, ans=0.04949747468305833 2024-08-10 13:43:39,002 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 17 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 13:43:41,875 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.72 vs. limit=10.0 2024-08-10 13:43:46,790 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:44:05,171 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=15.0 2024-08-10 13:44:17,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=580620.0, ans=0.125 2024-08-10 13:44:19,293 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 13:44:23,672 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-10 13:44:37,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=580620.0, ans=0.0 2024-08-10 13:44:41,931 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 100, loss[loss=0.09241, beats_loss=0.00947, ecapa_loss=0.0003308, whisper_loss=0.07963, over 16423.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01171, ecapa_loss=0.0002588, whisper_loss=0.0951, over 1552137.69 frames. ], batch size: 64, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:45:16,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=580820.0, ans=0.0 2024-08-10 13:45:19,841 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-10 13:45:39,968 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 13:45:43,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+01 3.227e+01 3.615e+01 4.275e+01 6.139e+01, threshold=7.229e+01, percent-clipped=0.0 2024-08-10 13:45:45,317 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-10 13:46:04,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=581020.0, ans=0.125 2024-08-10 13:46:14,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=581120.0, ans=0.0 2024-08-10 13:46:15,589 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-10 13:46:27,391 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 150, loss[loss=0.114, beats_loss=0.01189, ecapa_loss=0.0002232, whisper_loss=0.09983, over 20586.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01164, ecapa_loss=0.0002522, whisper_loss=0.0948, over 2053246.67 frames. ], batch size: 80, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:46:33,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=581220.0, ans=0.0 2024-08-10 13:46:36,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=581220.0, ans=0.125 2024-08-10 13:46:49,568 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 13:47:07,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=581420.0, ans=0.0 2024-08-10 13:47:28,937 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 13:47:29,443 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2024-08-10 13:47:30,457 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 13:47:31,837 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 28 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 13:47:43,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=581620.0, ans=0.2 2024-08-10 13:47:46,513 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 200, loss[loss=0.1083, beats_loss=0.01263, ecapa_loss=0.0002338, whisper_loss=0.09337, over 14186.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01157, ecapa_loss=0.0002517, whisper_loss=0.0964, over 2415815.44 frames. ], batch size: 57, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:48:16,420 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 31 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 13:48:28,243 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.056e+01 3.079e+01 3.499e+01 4.044e+01 6.352e+01, threshold=6.999e+01, percent-clipped=0.0 2024-08-10 13:48:28,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=581920.0, ans=0.125 2024-08-10 13:48:46,398 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-10 13:48:48,455 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.30 vs. limit=22.5 2024-08-10 13:49:01,001 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 250, loss[loss=0.09917, beats_loss=0.01406, ecapa_loss=0.0002283, whisper_loss=0.08283, over 19460.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01151, ecapa_loss=0.000252, whisper_loss=0.09571, over 2702267.19 frames. ], batch size: 79, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:49:30,314 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.594e-03 2024-08-10 13:49:36,633 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 13:49:46,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=582520.0, ans=0.125 2024-08-10 13:49:53,785 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 14 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 13:50:01,401 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 13:50:05,976 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 14 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 13:50:16,404 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 300, loss[loss=0.1179, beats_loss=0.01141, ecapa_loss=0.0001747, whisper_loss=0.1048, over 15883.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01154, ecapa_loss=0.0002475, whisper_loss=0.09561, over 2902229.15 frames. ], batch size: 58, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:50:19,805 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 19 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-10 13:50:23,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=582720.0, ans=0.1 2024-08-10 13:50:30,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=582820.0, ans=0.2 2024-08-10 13:50:58,480 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.973e+01 3.374e+01 4.127e+01 8.161e+01, threshold=6.749e+01, percent-clipped=1.0 2024-08-10 13:51:16,330 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 13:51:25,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=583120.0, ans=0.0 2024-08-10 13:51:27,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=583120.0, ans=0.2 2024-08-10 13:51:30,052 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 350, loss[loss=0.1221, beats_loss=0.01166, ecapa_loss=0.0002373, whisper_loss=0.1081, over 21866.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01179, ecapa_loss=0.0002437, whisper_loss=0.09466, over 3132121.40 frames. ], batch size: 84, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:51:39,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=583220.0, ans=0.0 2024-08-10 13:52:16,694 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 13:52:19,115 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-10 13:52:34,478 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 13:52:35,716 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 13:52:43,043 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 400, loss[loss=0.08305, beats_loss=0.01105, ecapa_loss=0.000222, whisper_loss=0.06978, over 15624.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01181, ecapa_loss=0.0002443, whisper_loss=0.09475, over 3262567.72 frames. ], batch size: 61, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:52:51,044 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 38 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 13:52:55,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=583720.0, ans=0.0 2024-08-10 13:53:05,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=583820.0, ans=0.125 2024-08-10 13:53:11,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=583820.0, ans=0.0 2024-08-10 13:53:25,607 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.918e+01 3.260e+01 3.754e+01 7.890e+01, threshold=6.521e+01, percent-clipped=1.0 2024-08-10 13:53:40,995 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 31 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 13:53:44,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=584120.0, ans=0.125 2024-08-10 13:53:44,831 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=12.0 2024-08-10 13:53:54,959 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 16 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 13:53:55,339 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-10 13:53:59,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=584220.0, ans=0.1 2024-08-10 13:53:59,673 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-10 13:54:00,547 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 450, loss[loss=0.1122, beats_loss=0.01219, ecapa_loss=0.0002332, whisper_loss=0.09773, over 20356.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01172, ecapa_loss=0.0002462, whisper_loss=0.09539, over 3394527.61 frames. ], batch size: 75, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:54:00,711 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 13:54:08,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=584220.0, ans=0.07 2024-08-10 13:54:09,387 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 13:54:19,046 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 13:54:23,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=584320.0, ans=0.125 2024-08-10 13:54:31,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=584420.0, ans=0.0 2024-08-10 13:54:37,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=584420.0, ans=0.125 2024-08-10 13:54:38,059 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 13:54:38,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=584420.0, ans=0.2 2024-08-10 13:54:42,554 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.92 vs. limit=5.0 2024-08-10 13:55:10,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584620.0, ans=0.1 2024-08-10 13:55:12,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 500, loss[loss=0.1134, beats_loss=0.01192, ecapa_loss=0.0002595, whisper_loss=0.09889, over 21649.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01186, ecapa_loss=0.0002431, whisper_loss=0.09477, over 3508445.01 frames. ], batch size: 89, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:55:13,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=584720.0, ans=0.0 2024-08-10 13:55:17,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-08-10 13:55:32,412 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 13:55:32,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=584820.0, ans=0.0 2024-08-10 13:55:52,504 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.793e+01 3.161e+01 3.607e+01 7.948e+01, threshold=6.322e+01, percent-clipped=1.0 2024-08-10 13:55:52,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=584920.0, ans=0.0 2024-08-10 13:55:53,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2024-08-10 13:55:58,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2024-08-10 13:56:02,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585020.0, ans=0.1 2024-08-10 13:56:24,038 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 550, loss[loss=0.1249, beats_loss=0.01203, ecapa_loss=0.0002907, whisper_loss=0.11, over 19820.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01181, ecapa_loss=0.0002439, whisper_loss=0.09417, over 3589637.47 frames. ], batch size: 84, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:56:32,576 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 13:57:36,116 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 600, loss[loss=0.09537, beats_loss=0.01279, ecapa_loss=0.0001881, whisper_loss=0.0807, over 23151.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01193, ecapa_loss=0.00024, whisper_loss=0.09425, over 3671679.78 frames. ], batch size: 92, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:57:45,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=585720.0, ans=0.125 2024-08-10 13:57:57,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=585820.0, ans=0.5 2024-08-10 13:58:03,481 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.34 vs. limit=22.5 2024-08-10 13:58:16,849 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.818e+01 3.113e+01 3.779e+01 5.763e+01, threshold=6.225e+01, percent-clipped=0.0 2024-08-10 13:58:18,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=586020.0, ans=0.1 2024-08-10 13:58:21,269 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 27 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 13:58:28,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=586020.0, ans=0.125 2024-08-10 13:58:34,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=586120.0, ans=0.1 2024-08-10 13:58:44,554 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 13:58:48,508 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 650, loss[loss=0.1227, beats_loss=0.009275, ecapa_loss=0.0002018, whisper_loss=0.1114, over 23603.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.0119, ecapa_loss=0.0002405, whisper_loss=0.0941, over 3717244.64 frames. ], batch size: 89, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:58:53,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=586220.0, ans=0.0 2024-08-10 13:58:57,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=586220.0, ans=0.025 2024-08-10 13:59:02,993 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 24 from Vox, 16 fro AS 2024-08-10 13:59:18,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=586420.0, ans=0.0 2024-08-10 13:59:29,361 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 13:59:31,994 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 13:59:46,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=586620.0, ans=0.125 2024-08-10 13:59:58,195 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 700, loss[loss=0.1003, beats_loss=0.01355, ecapa_loss=0.0001951, whisper_loss=0.08476, over 18412.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01187, ecapa_loss=0.0002397, whisper_loss=0.09405, over 3717013.35 frames. ], batch size: 72, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:00:03,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=586720.0, ans=0.125 2024-08-10 14:00:07,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=586720.0, ans=0.1 2024-08-10 14:00:22,932 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-10 14:00:38,601 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 2.878e+01 3.235e+01 3.847e+01 7.521e+01, threshold=6.470e+01, percent-clipped=2.0 2024-08-10 14:01:07,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=587120.0, ans=0.0 2024-08-10 14:01:11,551 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 750, loss[loss=0.1037, beats_loss=0.01288, ecapa_loss=0.0002764, whisper_loss=0.08806, over 19016.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01192, ecapa_loss=0.0002383, whisper_loss=0.09361, over 3730574.33 frames. ], batch size: 77, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:01:21,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=587220.0, ans=0.125 2024-08-10 14:01:40,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=587420.0, ans=0.0 2024-08-10 14:01:48,366 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 14:01:54,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=587520.0, ans=0.0 2024-08-10 14:02:15,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=587620.0, ans=0.125 2024-08-10 14:02:19,084 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 14:02:21,750 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 800, loss[loss=0.1388, beats_loss=0.008913, ecapa_loss=0.0002739, whisper_loss=0.1272, over 22654.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01185, ecapa_loss=0.0002384, whisper_loss=0.09392, over 3750873.64 frames. ], batch size: 89, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:02:23,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=587720.0, ans=10.0 2024-08-10 14:02:46,502 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-08-10 14:02:51,791 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 14:02:53,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=587920.0, ans=0.125 2024-08-10 14:02:58,831 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.98 vs. limit=22.5 2024-08-10 14:03:01,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.857e+01 3.282e+01 4.072e+01 6.223e+01, threshold=6.564e+01, percent-clipped=0.0 2024-08-10 14:03:33,902 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 850, loss[loss=0.1257, beats_loss=0.006974, ecapa_loss=0.0003013, whisper_loss=0.1157, over 21590.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01183, ecapa_loss=0.000237, whisper_loss=0.09443, over 3788785.05 frames. ], batch size: 87, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:03:45,347 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 14:03:58,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=588320.0, ans=0.1 2024-08-10 14:04:00,177 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-08-10 14:04:04,526 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 12 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 14:04:11,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=588420.0, ans=0.125 2024-08-10 14:04:20,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=588520.0, ans=0.125 2024-08-10 14:04:28,207 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2024-08-10 14:04:50,389 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 900, loss[loss=0.1077, beats_loss=0.01431, ecapa_loss=0.0001676, whisper_loss=0.09171, over 16544.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01182, ecapa_loss=0.0002369, whisper_loss=0.09419, over 3798399.42 frames. ], batch size: 63, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:04:55,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=588720.0, ans=0.0 2024-08-10 14:04:55,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=588720.0, ans=0.125 2024-08-10 14:05:07,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=588820.0, ans=0.2 2024-08-10 14:05:11,940 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:05:22,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=588920.0, ans=0.125 2024-08-10 14:05:32,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.732e+01 3.108e+01 3.625e+01 6.653e+01, threshold=6.216e+01, percent-clipped=1.0 2024-08-10 14:05:32,192 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 14:05:39,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=589020.0, ans=0.125 2024-08-10 14:05:46,395 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 14:05:50,333 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 31 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-10 14:06:05,863 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 950, loss[loss=0.108, beats_loss=0.0133, ecapa_loss=0.0002236, whisper_loss=0.09248, over 22260.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01179, ecapa_loss=0.0002365, whisper_loss=0.09447, over 3790850.95 frames. ], batch size: 86, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:06:19,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=589320.0, ans=0.125 2024-08-10 14:06:40,657 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 14:06:51,767 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=15.0 2024-08-10 14:06:55,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.30 vs. limit=22.5 2024-08-10 14:07:21,593 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1000, loss[loss=0.1127, beats_loss=0.01137, ecapa_loss=0.0002452, whisper_loss=0.09884, over 20922.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.0119, ecapa_loss=0.0002352, whisper_loss=0.0944, over 3829219.71 frames. ], batch size: 83, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:07:22,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=589720.0, ans=0.0 2024-08-10 14:07:24,832 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 10 from Vox, 40 fro AS 2024-08-10 14:07:35,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=589820.0, ans=0.125 2024-08-10 14:07:35,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=589820.0, ans=0.125 2024-08-10 14:07:55,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=589920.0, ans=0.125 2024-08-10 14:07:56,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=589920.0, ans=0.125 2024-08-10 14:08:04,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.773e+01 3.202e+01 3.484e+01 8.284e+01, threshold=6.403e+01, percent-clipped=2.0 2024-08-10 14:08:36,729 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 34 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 14:08:37,981 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1050, loss[loss=0.1373, beats_loss=0.009821, ecapa_loss=0.0002093, whisper_loss=0.1254, over 22225.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01194, ecapa_loss=0.0002352, whisper_loss=0.09424, over 3870735.23 frames. ], batch size: 82, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:08:57,300 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 14:09:05,595 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.93 vs. limit=22.5 2024-08-10 14:09:14,236 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 14:09:21,210 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 14:09:32,903 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-10 14:09:51,343 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 14:09:55,103 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1100, loss[loss=0.09651, beats_loss=0.01291, ecapa_loss=0.0001854, whisper_loss=0.08174, over 23286.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01193, ecapa_loss=0.0002343, whisper_loss=0.09399, over 3857026.85 frames. ], batch size: 94, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:10:15,645 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 14:10:18,731 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 14:10:29,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=590920.0, ans=0.2 2024-08-10 14:10:30,806 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 14:10:37,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=590920.0, ans=0.0 2024-08-10 14:10:41,039 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 14:10:42,516 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.893e+01 3.257e+01 3.748e+01 6.503e+01, threshold=6.515e+01, percent-clipped=1.0 2024-08-10 14:10:42,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=590920.0, ans=0.125 2024-08-10 14:10:47,923 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 14:10:57,471 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 14:11:01,955 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 14:11:02,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=591120.0, ans=0.0 2024-08-10 14:11:05,613 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:11:05,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=591120.0, ans=0.125 2024-08-10 14:11:15,736 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1150, loss[loss=0.1015, beats_loss=0.01439, ecapa_loss=0.0002067, whisper_loss=0.08508, over 22714.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.0119, ecapa_loss=0.000235, whisper_loss=0.09465, over 3889063.92 frames. ], batch size: 91, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:11:36,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=591320.0, ans=0.0 2024-08-10 14:11:53,024 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 14:11:56,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=591420.0, ans=0.0 2024-08-10 14:12:03,590 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 14:12:05,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=591520.0, ans=0.2 2024-08-10 14:12:12,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=591620.0, ans=0.0 2024-08-10 14:12:29,312 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1200, loss[loss=0.1151, beats_loss=0.01293, ecapa_loss=0.00019, whisper_loss=0.1002, over 19285.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.0119, ecapa_loss=0.0002344, whisper_loss=0.0949, over 3873182.19 frames. ], batch size: 75, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:12:31,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=591720.0, ans=0.1 2024-08-10 14:12:33,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2024-08-10 14:12:53,644 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-10 14:13:07,924 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2024-08-10 14:13:12,128 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+01 2.894e+01 3.360e+01 3.999e+01 6.251e+01, threshold=6.719e+01, percent-clipped=0.0 2024-08-10 14:13:32,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=592120.0, ans=0.125 2024-08-10 14:13:46,376 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1250, loss[loss=0.1213, beats_loss=0.0113, ecapa_loss=0.0002184, whisper_loss=0.1078, over 23661.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01191, ecapa_loss=0.0002325, whisper_loss=0.09476, over 3878289.69 frames. ], batch size: 90, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:13:51,566 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 8 from Vox, 31 fro AS 2024-08-10 14:13:56,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592220.0, ans=0.1 2024-08-10 14:14:08,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=592320.0, ans=10.0 2024-08-10 14:14:12,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=592320.0, ans=0.07 2024-08-10 14:14:23,716 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 14:14:43,175 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 14:14:53,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=592620.0, ans=0.125 2024-08-10 14:15:03,216 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1300, loss[loss=0.107, beats_loss=0.01248, ecapa_loss=0.0001936, whisper_loss=0.09258, over 17935.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01187, ecapa_loss=0.0002323, whisper_loss=0.09525, over 3902815.34 frames. ], batch size: 68, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:15:05,596 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 14:15:06,458 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.11 vs. limit=22.5 2024-08-10 14:15:50,107 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.731e+01 3.070e+01 3.519e+01 6.243e+01, threshold=6.140e+01, percent-clipped=0.0 2024-08-10 14:16:24,069 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1350, loss[loss=0.1191, beats_loss=0.0104, ecapa_loss=0.0001749, whisper_loss=0.1069, over 18052.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01177, ecapa_loss=0.0002324, whisper_loss=0.09524, over 3872387.53 frames. ], batch size: 66, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:16:26,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=593220.0, ans=0.5 2024-08-10 14:16:32,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=593220.0, ans=0.1 2024-08-10 14:16:39,714 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2024-08-10 14:16:45,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=593320.0, ans=0.125 2024-08-10 14:16:52,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=593320.0, ans=0.125 2024-08-10 14:17:39,397 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 14:17:39,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=593620.0, ans=0.1 2024-08-10 14:17:44,559 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1400, loss[loss=0.1012, beats_loss=0.01056, ecapa_loss=0.0002415, whisper_loss=0.08826, over 15174.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01174, ecapa_loss=0.0002336, whisper_loss=0.09485, over 3875354.72 frames. ], batch size: 60, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:17:49,811 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 14:18:03,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=593820.0, ans=0.5 2024-08-10 14:18:18,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=593920.0, ans=0.125 2024-08-10 14:18:23,984 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 14:18:25,251 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.747e+01 3.189e+01 3.732e+01 5.782e+01, threshold=6.377e+01, percent-clipped=0.0 2024-08-10 14:18:31,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=594020.0, ans=0.0 2024-08-10 14:18:45,231 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 14:18:56,617 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1450, loss[loss=0.09853, beats_loss=0.01058, ecapa_loss=0.0002347, whisper_loss=0.0856, over 16854.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01177, ecapa_loss=0.0002326, whisper_loss=0.0955, over 3900356.44 frames. ], batch size: 65, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:19:41,634 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.40 vs. limit=6.0 2024-08-10 14:19:42,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=594320.0, ans=0.0 2024-08-10 14:19:44,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=594320.0, ans=0.125 2024-08-10 14:19:47,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=594320.0, ans=0.2 2024-08-10 14:19:58,030 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 14:20:06,962 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:20:29,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=594620.0, ans=0.2 2024-08-10 14:20:34,956 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 14:20:36,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-10 14:20:38,066 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 14:20:40,782 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1500, loss[loss=0.1217, beats_loss=0.01118, ecapa_loss=0.0002243, whisper_loss=0.1082, over 15209.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01182, ecapa_loss=0.0002328, whisper_loss=0.09473, over 3854192.07 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:20:41,001 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 14:20:51,376 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.17 vs. limit=15.0 2024-08-10 14:20:57,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=594820.0, ans=0.125 2024-08-10 14:20:58,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=594820.0, ans=0.04949747468305833 2024-08-10 14:21:10,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=594920.0, ans=0.2 2024-08-10 14:21:12,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=594920.0, ans=0.125 2024-08-10 14:21:15,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=594920.0, ans=0.2 2024-08-10 14:21:24,556 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.686e+01 3.011e+01 3.504e+01 1.040e+02, threshold=6.023e+01, percent-clipped=2.0 2024-08-10 14:21:38,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=595020.0, ans=0.125 2024-08-10 14:21:47,658 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 14:21:49,041 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 24 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-10 14:21:59,056 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1550, loss[loss=0.08033, beats_loss=0.01224, ecapa_loss=0.0002378, whisper_loss=0.0657, over 22007.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0118, ecapa_loss=0.0002336, whisper_loss=0.09437, over 3823283.77 frames. ], batch size: 88, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:22:21,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=595320.0, ans=0.125 2024-08-10 14:22:29,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=595420.0, ans=0.125 2024-08-10 14:22:37,019 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:22:46,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=595520.0, ans=0.0 2024-08-10 14:22:51,327 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-10 14:22:52,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595520.0, ans=0.1 2024-08-10 14:23:13,771 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.51 vs. limit=22.5 2024-08-10 14:23:15,847 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1600, loss[loss=0.1167, beats_loss=0.009678, ecapa_loss=0.0002168, whisper_loss=0.1049, over 18142.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01176, ecapa_loss=0.0002335, whisper_loss=0.09414, over 3806428.36 frames. ], batch size: 67, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:23:26,989 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 14:23:34,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=595820.0, ans=0.125 2024-08-10 14:23:59,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.802e+01 3.147e+01 3.611e+01 5.289e+01, threshold=6.294e+01, percent-clipped=0.0 2024-08-10 14:24:01,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=596020.0, ans=0.125 2024-08-10 14:24:04,490 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 14:24:06,034 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 14:24:16,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2024-08-10 14:24:17,518 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 14:24:30,889 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 35 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 14:24:37,085 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1650, loss[loss=0.1209, beats_loss=0.012, ecapa_loss=0.000234, whisper_loss=0.1066, over 16114.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01174, ecapa_loss=0.0002334, whisper_loss=0.09527, over 3855653.04 frames. ], batch size: 61, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:24:42,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=596220.0, ans=6.0 2024-08-10 14:24:49,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=596220.0, ans=0.07 2024-08-10 14:25:06,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=596420.0, ans=0.1 2024-08-10 14:25:06,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=596420.0, ans=0.0 2024-08-10 14:25:21,046 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-10 14:25:22,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=596520.0, ans=0.125 2024-08-10 14:25:27,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=596520.0, ans=0.2 2024-08-10 14:25:52,991 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1700, loss[loss=0.1144, beats_loss=0.01389, ecapa_loss=0.0002632, whisper_loss=0.09793, over 21725.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01173, ecapa_loss=0.0002321, whisper_loss=0.09491, over 3818911.49 frames. ], batch size: 91, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:26:00,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=596720.0, ans=10.0 2024-08-10 14:26:15,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=596820.0, ans=0.125 2024-08-10 14:26:34,980 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+01 2.744e+01 3.070e+01 3.564e+01 5.631e+01, threshold=6.139e+01, percent-clipped=0.0 2024-08-10 14:27:08,778 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1750, loss[loss=0.1026, beats_loss=0.01096, ecapa_loss=0.0002146, whisper_loss=0.08953, over 16175.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.0117, ecapa_loss=0.0002323, whisper_loss=0.09486, over 3808829.22 frames. ], batch size: 64, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:27:11,508 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 14:27:35,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=597320.0, ans=0.125 2024-08-10 14:28:06,608 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=15.0 2024-08-10 14:28:10,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=597620.0, ans=0.125 2024-08-10 14:28:19,991 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.73 vs. limit=10.0 2024-08-10 14:28:26,701 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1800, loss[loss=0.09609, beats_loss=0.01147, ecapa_loss=0.0002622, whisper_loss=0.082, over 16534.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01172, ecapa_loss=0.0002311, whisper_loss=0.09464, over 3802056.90 frames. ], batch size: 69, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:28:26,901 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 14:28:28,371 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 14:28:32,923 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-10 14:28:33,630 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 14:28:33,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=597720.0, ans=0.2 2024-08-10 14:28:35,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=597720.0, ans=0.0 2024-08-10 14:28:51,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=597820.0, ans=0.125 2024-08-10 14:28:56,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=597920.0, ans=0.0 2024-08-10 14:29:07,577 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.774e+01 3.082e+01 3.729e+01 4.718e+01, threshold=6.165e+01, percent-clipped=0.0 2024-08-10 14:29:40,248 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1850, loss[loss=0.1074, beats_loss=0.01098, ecapa_loss=0.0002804, whisper_loss=0.09366, over 21753.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01167, ecapa_loss=0.0002319, whisper_loss=0.09485, over 3805438.56 frames. ], batch size: 88, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:29:41,251 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-10 14:29:48,408 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 16 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 14:29:58,742 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 14:30:05,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=598320.0, ans=0.2 2024-08-10 14:30:12,726 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 14:30:17,516 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.035e-01 2024-08-10 14:30:41,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=598620.0, ans=0.07 2024-08-10 14:30:57,513 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1900, loss[loss=0.1245, beats_loss=0.008717, ecapa_loss=0.0003457, whisper_loss=0.1123, over 18619.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0118, ecapa_loss=0.0002368, whisper_loss=0.09418, over 3819911.90 frames. ], batch size: 76, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:31:04,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598720.0, ans=0.1 2024-08-10 14:31:12,272 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-10 14:31:15,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=598820.0, ans=0.0 2024-08-10 14:31:35,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=598920.0, ans=0.125 2024-08-10 14:31:41,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.853e+01 3.252e+01 3.827e+01 6.548e+01, threshold=6.504e+01, percent-clipped=1.0 2024-08-10 14:31:45,265 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2024-08-10 14:31:49,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=599020.0, ans=0.5 2024-08-10 14:32:05,547 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 9 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-10 14:32:07,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=599120.0, ans=0.125 2024-08-10 14:32:09,995 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 15 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 14:32:14,400 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 1950, loss[loss=0.106, beats_loss=0.01271, ecapa_loss=0.0002322, whisper_loss=0.09095, over 23877.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0118, ecapa_loss=0.0002389, whisper_loss=0.0941, over 3824758.41 frames. ], batch size: 94, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:32:26,934 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 18 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-10 14:32:33,456 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.33 vs. limit=15.0 2024-08-10 14:32:50,378 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 14:32:56,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=599420.0, ans=0.2 2024-08-10 14:33:03,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=599520.0, ans=0.07 2024-08-10 14:33:05,080 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-10 14:33:10,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=599520.0, ans=0.05 2024-08-10 14:33:18,153 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 19 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-10 14:33:18,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=599620.0, ans=0.125 2024-08-10 14:33:30,112 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2000, loss[loss=0.1103, beats_loss=0.01004, ecapa_loss=0.0002859, whisper_loss=0.09742, over 17784.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01182, ecapa_loss=0.0002417, whisper_loss=0.09412, over 3801002.00 frames. ], batch size: 72, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:33:41,131 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-10 14:33:52,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=599820.0, ans=0.125 2024-08-10 14:33:53,488 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 25 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 14:34:01,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=599920.0, ans=0.1 2024-08-10 14:34:16,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.787e+01 3.156e+01 3.560e+01 5.120e+01, threshold=6.313e+01, percent-clipped=0.0 2024-08-10 14:34:19,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600020.0, ans=0.1 2024-08-10 14:34:31,574 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 14:34:50,049 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2050, loss[loss=0.1222, beats_loss=0.009947, ecapa_loss=0.0002731, whisper_loss=0.1095, over 20313.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01181, ecapa_loss=0.0002436, whisper_loss=0.09385, over 3797302.15 frames. ], batch size: 84, lr: 1.29e-02, grad_scale: 34359738368.0 2024-08-10 14:34:55,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=600220.0, ans=0.125 2024-08-10 14:35:00,005 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=12.0 2024-08-10 14:35:00,774 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 14:35:27,465 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 14:35:33,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=600420.0, ans=0.125 2024-08-10 14:35:38,933 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 14:35:49,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=600620.0, ans=0.0 2024-08-10 14:35:50,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=600620.0, ans=0.125 2024-08-10 14:35:55,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=600620.0, ans=0.125 2024-08-10 14:36:04,722 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2100, loss[loss=0.1109, beats_loss=0.01146, ecapa_loss=0.0002288, whisper_loss=0.09719, over 18390.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0119, ecapa_loss=0.0002446, whisper_loss=0.09348, over 3814390.49 frames. ], batch size: 72, lr: 1.29e-02, grad_scale: 34359738368.0 2024-08-10 14:36:08,342 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-10 14:36:24,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=600820.0, ans=0.0 2024-08-10 14:36:32,169 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 14:36:42,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=600920.0, ans=0.025 2024-08-10 14:36:42,896 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=15.0 2024-08-10 14:36:46,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.750e+01 3.110e+01 3.646e+01 5.998e+01, threshold=6.220e+01, percent-clipped=0.0 2024-08-10 14:36:54,314 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 14:36:54,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=601020.0, ans=0.0 2024-08-10 14:37:02,761 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 14:37:05,234 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-08-10 14:37:12,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.22 vs. limit=15.0 2024-08-10 14:37:17,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=601120.0, ans=0.125 2024-08-10 14:37:19,335 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2150, loss[loss=0.1158, beats_loss=0.01364, ecapa_loss=0.0002416, whisper_loss=0.09972, over 22803.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01192, ecapa_loss=0.0002438, whisper_loss=0.09371, over 3805605.56 frames. ], batch size: 89, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:37:28,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=601220.0, ans=0.0 2024-08-10 14:37:47,280 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.64 vs. limit=15.0 2024-08-10 14:37:55,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=601420.0, ans=0.2 2024-08-10 14:38:02,564 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 14:38:09,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=601520.0, ans=0.125 2024-08-10 14:38:11,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=601520.0, ans=0.125 2024-08-10 14:38:13,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=601520.0, ans=0.125 2024-08-10 14:38:13,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=601520.0, ans=0.0 2024-08-10 14:38:20,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=601620.0, ans=0.0 2024-08-10 14:38:33,313 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 14:38:35,956 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2200, loss[loss=0.1198, beats_loss=0.00983, ecapa_loss=0.0002032, whisper_loss=0.1079, over 16528.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01189, ecapa_loss=0.0002441, whisper_loss=0.09428, over 3822361.74 frames. ], batch size: 61, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:38:36,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=601720.0, ans=0.0 2024-08-10 14:38:40,659 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 28 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 14:38:42,319 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:38:48,496 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-10 14:38:49,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-10 14:39:04,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=601920.0, ans=0.125 2024-08-10 14:39:05,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=601920.0, ans=0.0 2024-08-10 14:39:08,109 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 14:39:14,460 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.850e+01 3.154e+01 3.768e+01 5.598e+01, threshold=6.309e+01, percent-clipped=0.0 2024-08-10 14:39:21,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=602020.0, ans=0.0 2024-08-10 14:39:23,834 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 14:39:34,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=602120.0, ans=0.125 2024-08-10 14:39:39,717 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.00 vs. limit=10.0 2024-08-10 14:39:42,025 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 13 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 14:39:42,981 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2250, loss[loss=0.07986, beats_loss=0.01689, ecapa_loss=0.0002416, whisper_loss=0.06056, over 13548.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01187, ecapa_loss=0.0002445, whisper_loss=0.09525, over 3814918.55 frames. ], batch size: 58, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:39:49,519 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 14:39:57,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=602320.0, ans=0.1 2024-08-10 14:39:57,560 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:39:58,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=602320.0, ans=0.07 2024-08-10 14:40:06,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=602320.0, ans=0.125 2024-08-10 14:40:11,169 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-10 14:40:16,498 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-10 14:40:22,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=602520.0, ans=0.125 2024-08-10 14:40:36,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=602620.0, ans=0.125 2024-08-10 14:40:37,043 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 14:40:43,540 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 14:40:47,075 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2300, loss[loss=0.1072, beats_loss=0.01124, ecapa_loss=0.0002561, whisper_loss=0.09337, over 16499.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01188, ecapa_loss=0.0002458, whisper_loss=0.09559, over 3850622.05 frames. ], batch size: 66, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:40:49,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-10 14:40:52,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=602720.0, ans=0.125 2024-08-10 14:40:59,293 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=15.0 2024-08-10 14:41:01,271 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 14:41:02,507 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 14:41:23,190 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.867e+01 3.175e+01 3.741e+01 6.464e+01, threshold=6.350e+01, percent-clipped=1.0 2024-08-10 14:41:26,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=603020.0, ans=0.0 2024-08-10 14:41:29,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=603020.0, ans=0.125 2024-08-10 14:41:32,653 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-10 14:41:43,838 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 14:41:51,349 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2350, loss[loss=0.1199, beats_loss=0.0107, ecapa_loss=0.0001917, whisper_loss=0.1073, over 18499.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01184, ecapa_loss=0.0002456, whisper_loss=0.09563, over 3861501.35 frames. ], batch size: 66, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:42:09,284 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-10 14:42:12,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=603320.0, ans=0.125 2024-08-10 14:42:19,804 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 14:42:41,561 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 14:42:45,530 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 14:42:49,178 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 14:42:55,248 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2400, loss[loss=0.0926, beats_loss=0.01188, ecapa_loss=0.0003296, whisper_loss=0.07743, over 19734.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01172, ecapa_loss=0.0002478, whisper_loss=0.09557, over 3872339.63 frames. ], batch size: 88, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:43:04,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=603720.0, ans=0.07 2024-08-10 14:43:18,224 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.74 vs. limit=15.0 2024-08-10 14:43:24,267 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 14:43:29,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=603920.0, ans=0.125 2024-08-10 14:43:31,815 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.726e+01 3.127e+01 3.676e+01 5.177e+01, threshold=6.255e+01, percent-clipped=0.0 2024-08-10 14:43:33,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=604020.0, ans=0.125 2024-08-10 14:43:45,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=604020.0, ans=0.0 2024-08-10 14:44:00,606 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2450, loss[loss=0.1281, beats_loss=0.01293, ecapa_loss=0.0002153, whisper_loss=0.1131, over 22955.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01172, ecapa_loss=0.0002453, whisper_loss=0.09624, over 3908989.93 frames. ], batch size: 92, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:44:07,826 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=15.0 2024-08-10 14:44:11,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=604220.0, ans=0.125 2024-08-10 14:44:13,538 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-10 14:44:21,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=604320.0, ans=0.2 2024-08-10 14:44:25,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=604420.0, ans=0.2 2024-08-10 14:44:42,092 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.79 vs. limit=15.0 2024-08-10 14:44:49,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=604520.0, ans=0.0 2024-08-10 14:44:58,910 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.52 vs. limit=5.0 2024-08-10 14:45:02,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=604620.0, ans=0.0 2024-08-10 14:45:05,748 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2500, loss[loss=0.1266, beats_loss=0.01275, ecapa_loss=0.000203, whisper_loss=0.1118, over 22111.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01183, ecapa_loss=0.0002455, whisper_loss=0.09548, over 3878024.17 frames. ], batch size: 85, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:45:11,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=604720.0, ans=0.09899494936611666 2024-08-10 14:45:13,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=604720.0, ans=0.125 2024-08-10 14:45:27,059 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 14:45:34,518 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-10 14:45:38,386 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 14:45:42,117 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.835e+01 3.123e+01 3.643e+01 5.985e+01, threshold=6.245e+01, percent-clipped=0.0 2024-08-10 14:46:02,063 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 14:46:07,811 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2024-08-10 14:46:11,027 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2550, loss[loss=0.1103, beats_loss=0.009297, ecapa_loss=0.0002066, whisper_loss=0.09898, over 17577.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01182, ecapa_loss=0.0002457, whisper_loss=0.0957, over 3876878.45 frames. ], batch size: 64, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:46:23,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=605320.0, ans=0.1 2024-08-10 14:46:25,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=605320.0, ans=0.0 2024-08-10 14:46:40,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=605420.0, ans=0.125 2024-08-10 14:46:50,606 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 19 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-10 14:47:15,646 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2600, loss[loss=0.08347, beats_loss=0.0114, ecapa_loss=0.0002741, whisper_loss=0.06933, over 15798.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01174, ecapa_loss=0.0002458, whisper_loss=0.09624, over 3874550.67 frames. ], batch size: 64, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:47:33,689 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 14:47:36,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=605820.0, ans=0.0 2024-08-10 14:47:48,660 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.96 vs. limit=15.0 2024-08-10 14:47:51,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.738e+01 3.065e+01 3.602e+01 6.052e+01, threshold=6.131e+01, percent-clipped=0.0 2024-08-10 14:47:56,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=606020.0, ans=0.0 2024-08-10 14:47:57,594 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 14:48:01,735 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:48:01,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=606020.0, ans=0.125 2024-08-10 14:48:03,350 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:48:13,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=606120.0, ans=0.0 2024-08-10 14:48:15,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=606120.0, ans=0.125 2024-08-10 14:48:19,596 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-10 14:48:20,617 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2650, loss[loss=0.1133, beats_loss=0.01102, ecapa_loss=0.0003061, whisper_loss=0.09923, over 20297.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.0117, ecapa_loss=0.0002462, whisper_loss=0.09661, over 3852190.96 frames. ], batch size: 88, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:48:23,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=606220.0, ans=0.125 2024-08-10 14:48:27,229 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 14:48:28,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=606220.0, ans=0.125 2024-08-10 14:48:37,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.20 vs. limit=12.0 2024-08-10 14:48:39,036 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 14:48:51,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=606420.0, ans=0.125 2024-08-10 14:49:18,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=606620.0, ans=0.125 2024-08-10 14:49:25,677 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2700, loss[loss=0.09906, beats_loss=0.01188, ecapa_loss=0.0002589, whisper_loss=0.08459, over 19842.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01178, ecapa_loss=0.0002457, whisper_loss=0.09628, over 3872167.74 frames. ], batch size: 81, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:49:34,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=606720.0, ans=0.125 2024-08-10 14:49:35,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=606720.0, ans=0.0 2024-08-10 14:49:42,965 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 14:49:45,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=606820.0, ans=0.125 2024-08-10 14:49:48,190 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 14:49:53,304 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 14:49:57,390 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 14:50:00,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-10 14:50:02,248 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 3.024e+01 3.379e+01 4.188e+01 8.555e+01, threshold=6.757e+01, percent-clipped=2.0 2024-08-10 14:50:03,343 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.67 vs. limit=22.5 2024-08-10 14:50:12,777 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 14:50:14,403 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.996e-01 2024-08-10 14:50:30,979 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2750, loss[loss=0.1309, beats_loss=0.01298, ecapa_loss=0.0001695, whisper_loss=0.1163, over 17567.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01176, ecapa_loss=0.0002454, whisper_loss=0.09615, over 3853065.32 frames. ], batch size: 64, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:50:36,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=607220.0, ans=0.125 2024-08-10 14:50:53,017 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-08-10 14:50:57,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=607420.0, ans=0.1 2024-08-10 14:51:01,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=607420.0, ans=0.1 2024-08-10 14:51:16,157 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 14:51:19,035 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-10 14:51:36,986 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2800, loss[loss=0.09776, beats_loss=0.01281, ecapa_loss=0.0002417, whisper_loss=0.08254, over 21767.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01186, ecapa_loss=0.0002429, whisper_loss=0.09583, over 3844339.33 frames. ], batch size: 90, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:51:42,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=607720.0, ans=0.0 2024-08-10 14:51:44,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=607720.0, ans=0.2 2024-08-10 14:51:47,725 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 14:51:49,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=607820.0, ans=0.0 2024-08-10 14:52:14,104 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 2.765e+01 3.202e+01 3.631e+01 5.642e+01, threshold=6.403e+01, percent-clipped=0.0 2024-08-10 14:52:19,329 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 14:52:29,804 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 14:52:39,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=608120.0, ans=0.125 2024-08-10 14:52:42,560 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2850, loss[loss=0.09975, beats_loss=0.01479, ecapa_loss=0.0001877, whisper_loss=0.08308, over 22687.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01192, ecapa_loss=0.0002423, whisper_loss=0.0955, over 3881599.93 frames. ], batch size: 90, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:52:47,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=608220.0, ans=0.2 2024-08-10 14:53:00,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=608320.0, ans=0.125 2024-08-10 14:53:04,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=608320.0, ans=0.125 2024-08-10 14:53:05,459 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 14:53:08,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=608420.0, ans=0.02 2024-08-10 14:53:20,559 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 14:53:21,050 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=12.0 2024-08-10 14:53:26,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=608520.0, ans=0.0 2024-08-10 14:53:29,068 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.42 vs. limit=15.0 2024-08-10 14:53:36,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=608620.0, ans=0.2 2024-08-10 14:53:41,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=608620.0, ans=0.125 2024-08-10 14:53:47,888 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2900, loss[loss=0.1057, beats_loss=0.01167, ecapa_loss=0.0002212, whisper_loss=0.09177, over 18989.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01196, ecapa_loss=0.0002431, whisper_loss=0.09568, over 3877293.75 frames. ], batch size: 73, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:54:03,893 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 14:54:04,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=608820.0, ans=0.0 2024-08-10 14:54:06,430 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 14:54:11,489 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 32 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 14:54:16,976 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 19 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 14:54:24,851 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.824e+01 3.286e+01 3.731e+01 5.146e+01, threshold=6.573e+01, percent-clipped=0.0 2024-08-10 14:54:26,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=609020.0, ans=0.035 2024-08-10 14:54:27,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=609020.0, ans=0.125 2024-08-10 14:54:36,825 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 27 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-10 14:54:40,968 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 14:54:43,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=609120.0, ans=0.0 2024-08-10 14:54:46,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=609120.0, ans=0.2 2024-08-10 14:54:46,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=609120.0, ans=0.05 2024-08-10 14:54:46,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.64 vs. limit=15.0 2024-08-10 14:54:48,155 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.02 vs. limit=22.5 2024-08-10 14:54:53,798 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 2950, loss[loss=0.09784, beats_loss=0.0122, ecapa_loss=0.0003093, whisper_loss=0.08255, over 15043.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01189, ecapa_loss=0.0002434, whisper_loss=0.09626, over 3875972.35 frames. ], batch size: 62, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:54:56,543 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 14:55:00,663 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-10 14:55:05,234 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-08-10 14:55:06,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=609320.0, ans=0.125 2024-08-10 14:55:09,400 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 8 from Vox, 25 fro AS 2024-08-10 14:55:09,802 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=15.0 2024-08-10 14:55:21,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=609420.0, ans=0.1 2024-08-10 14:55:26,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=609420.0, ans=0.125 2024-08-10 14:55:26,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=609420.0, ans=0.1 2024-08-10 14:55:26,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.61 vs. limit=15.0 2024-08-10 14:55:34,546 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.20 vs. limit=22.5 2024-08-10 14:55:41,125 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-08-10 14:55:53,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=609620.0, ans=0.125 2024-08-10 14:55:55,365 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.78 vs. limit=15.0 2024-08-10 14:55:58,282 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3000, loss[loss=0.1113, beats_loss=0.007764, ecapa_loss=0.0002557, whisper_loss=0.101, over 17911.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01189, ecapa_loss=0.0002443, whisper_loss=0.09625, over 3899651.69 frames. ], batch size: 68, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:55:58,283 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 14:56:35,297 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on ASR_libri: loss=0.2643, beats_loss=0, ecapa_loss=0.0007548, whisper_loss=0.2568, over 922467.00 frames. 2024-08-10 14:56:52,761 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on SV_voxceleb1: loss=0.006405, beats_loss=0, ecapa_loss=0.0006405, whisper_loss=0, over 939242.00 frames. 2024-08-10 14:58:43,644 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on AT_audioset: loss=0.02683, beats_loss=0.02683, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 14:58:43,648 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 14:58:47,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=609720.0, ans=0.0 2024-08-10 14:58:48,273 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.04 vs. limit=10.0 2024-08-10 14:58:56,370 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 14:59:04,521 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 14:59:06,549 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.81 vs. limit=22.5 2024-08-10 14:59:07,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=609820.0, ans=0.0 2024-08-10 14:59:14,917 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 14:59:16,192 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 14:59:19,876 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 3.032e+01 3.491e+01 3.911e+01 5.761e+01, threshold=6.982e+01, percent-clipped=0.0 2024-08-10 14:59:21,542 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 14:59:23,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=610020.0, ans=0.125 2024-08-10 14:59:48,498 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-08-10 14:59:48,805 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3050, loss[loss=0.1046, beats_loss=0.01312, ecapa_loss=0.0002457, whisper_loss=0.08898, over 23523.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01188, ecapa_loss=0.0002447, whisper_loss=0.09659, over 3919195.77 frames. ], batch size: 94, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:59:53,037 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:59:53,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=610220.0, ans=0.2 2024-08-10 14:59:54,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=610220.0, ans=0.05 2024-08-10 15:00:14,767 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 15:00:27,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=610520.0, ans=0.0 2024-08-10 15:00:29,282 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 15:00:33,303 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:00:40,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=610620.0, ans=0.125 2024-08-10 15:00:41,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=22.5 2024-08-10 15:00:43,774 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 15:00:48,571 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=12.0 2024-08-10 15:00:54,633 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3100, loss[loss=0.1176, beats_loss=0.0099, ecapa_loss=0.000222, whisper_loss=0.1055, over 14973.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01183, ecapa_loss=0.0002462, whisper_loss=0.09659, over 3902366.83 frames. ], batch size: 59, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:01:07,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=610820.0, ans=0.0 2024-08-10 15:01:18,047 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 15:01:28,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=610920.0, ans=0.125 2024-08-10 15:01:29,362 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 15:01:33,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.743e+01 3.024e+01 3.559e+01 5.609e+01, threshold=6.048e+01, percent-clipped=0.0 2024-08-10 15:01:33,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=610920.0, ans=0.125 2024-08-10 15:01:42,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=611020.0, ans=0.125 2024-08-10 15:02:02,237 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 15:02:03,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3150, loss[loss=0.1061, beats_loss=0.01175, ecapa_loss=0.0002622, whisper_loss=0.09176, over 21565.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01182, ecapa_loss=0.0002464, whisper_loss=0.09687, over 3911443.60 frames. ], batch size: 92, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:02:13,629 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 15:02:22,195 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.59 vs. limit=15.0 2024-08-10 15:02:23,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=611320.0, ans=0.125 2024-08-10 15:02:24,094 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 15:02:24,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=611320.0, ans=0.09899494936611666 2024-08-10 15:02:36,886 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 15:02:38,890 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-10 15:02:43,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=611420.0, ans=0.125 2024-08-10 15:02:48,056 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 32 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 15:03:00,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=611620.0, ans=0.0 2024-08-10 15:03:03,111 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 15:03:09,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=611620.0, ans=0.0 2024-08-10 15:03:13,743 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2024-08-10 15:03:15,473 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3200, loss[loss=0.1154, beats_loss=0.01006, ecapa_loss=0.0002688, whisper_loss=0.1027, over 19739.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01175, ecapa_loss=0.0002442, whisper_loss=0.09792, over 3894630.54 frames. ], batch size: 81, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:03:18,149 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.86 vs. limit=15.0 2024-08-10 15:03:33,749 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 15:03:56,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.779e+01 3.150e+01 3.545e+01 6.901e+01, threshold=6.301e+01, percent-clipped=2.0 2024-08-10 15:04:03,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=612020.0, ans=0.0 2024-08-10 15:04:07,841 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=12.0 2024-08-10 15:04:09,619 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 15:04:11,694 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 15:04:28,373 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3250, loss[loss=0.1046, beats_loss=0.01058, ecapa_loss=0.0002569, whisper_loss=0.0915, over 21429.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01184, ecapa_loss=0.0002458, whisper_loss=0.09731, over 3894565.67 frames. ], batch size: 87, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:04:52,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=612320.0, ans=0.0 2024-08-10 15:04:58,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=612420.0, ans=10.0 2024-08-10 15:05:23,430 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 15:05:27,500 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 15:05:40,799 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3300, loss[loss=0.1224, beats_loss=0.01163, ecapa_loss=0.0002139, whisper_loss=0.1086, over 22403.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01185, ecapa_loss=0.0002459, whisper_loss=0.09676, over 3873562.23 frames. ], batch size: 89, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:05:48,173 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 15:06:07,120 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 31 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 15:06:07,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=612820.0, ans=0.0 2024-08-10 15:06:07,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=612820.0, ans=0.0 2024-08-10 15:06:22,298 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.755e+01 3.072e+01 3.647e+01 1.345e+02, threshold=6.143e+01, percent-clipped=1.0 2024-08-10 15:06:23,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=613020.0, ans=0.125 2024-08-10 15:06:27,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=613020.0, ans=0.2 2024-08-10 15:06:54,633 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3350, loss[loss=0.08711, beats_loss=0.01544, ecapa_loss=0.0001483, whisper_loss=0.07019, over 16241.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.0118, ecapa_loss=0.0002462, whisper_loss=0.09668, over 3873918.01 frames. ], batch size: 60, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:07:11,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=613320.0, ans=0.125 2024-08-10 15:07:20,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=613320.0, ans=0.05 2024-08-10 15:07:25,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=613420.0, ans=0.125 2024-08-10 15:07:33,181 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 15:07:33,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=613420.0, ans=0.0 2024-08-10 15:07:34,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=613420.0, ans=0.0 2024-08-10 15:07:36,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=613420.0, ans=0.125 2024-08-10 15:07:45,171 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 15:07:54,124 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2024-08-10 15:08:00,615 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 15:08:08,150 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3400, loss[loss=0.1074, beats_loss=0.01165, ecapa_loss=0.0002206, whisper_loss=0.09356, over 22634.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01179, ecapa_loss=0.0002432, whisper_loss=0.09686, over 3880845.98 frames. ], batch size: 92, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:08:16,746 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 15:08:25,888 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 21 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-10 15:08:49,553 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.884e+01 3.210e+01 3.796e+01 7.234e+01, threshold=6.419e+01, percent-clipped=1.0 2024-08-10 15:09:14,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=614120.0, ans=0.125 2024-08-10 15:09:21,357 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3450, loss[loss=0.09603, beats_loss=0.01218, ecapa_loss=0.0002266, whisper_loss=0.08159, over 14295.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01181, ecapa_loss=0.0002445, whisper_loss=0.0962, over 3872359.75 frames. ], batch size: 57, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:09:24,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=614220.0, ans=0.125 2024-08-10 15:09:36,934 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 15:09:48,620 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 15:10:06,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=614520.0, ans=0.125 2024-08-10 15:10:08,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=614520.0, ans=0.05 2024-08-10 15:10:08,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=614520.0, ans=0.1 2024-08-10 15:10:17,467 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2024-08-10 15:10:34,017 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3500, loss[loss=0.07853, beats_loss=0.01245, ecapa_loss=0.0002795, whisper_loss=0.06328, over 18617.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01186, ecapa_loss=0.0002455, whisper_loss=0.09547, over 3860289.47 frames. ], batch size: 77, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:10:34,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=614720.0, ans=0.125 2024-08-10 15:10:34,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=614720.0, ans=0.125 2024-08-10 15:10:37,352 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 15:10:59,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=614820.0, ans=0.125 2024-08-10 15:11:03,677 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 15:11:05,128 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-10 15:11:15,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.335e+01 2.754e+01 3.128e+01 3.525e+01 7.630e+01, threshold=6.256e+01, percent-clipped=1.0 2024-08-10 15:11:29,482 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 15:11:33,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=615120.0, ans=0.0 2024-08-10 15:11:35,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=615120.0, ans=0.125 2024-08-10 15:11:36,773 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 15:11:38,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=615120.0, ans=0.125 2024-08-10 15:11:46,900 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3550, loss[loss=0.1126, beats_loss=0.01232, ecapa_loss=0.0002623, whisper_loss=0.09769, over 15193.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01191, ecapa_loss=0.0002442, whisper_loss=0.09463, over 3850699.28 frames. ], batch size: 60, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:11:55,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=615220.0, ans=0.125 2024-08-10 15:12:01,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=615320.0, ans=0.125 2024-08-10 15:12:01,730 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2024-08-10 15:12:58,947 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3600, loss[loss=0.1283, beats_loss=0.008442, ecapa_loss=0.0003202, whisper_loss=0.1166, over 20518.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01179, ecapa_loss=0.0002445, whisper_loss=0.09551, over 3859464.70 frames. ], batch size: 81, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:13:00,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=615720.0, ans=0.125 2024-08-10 15:13:01,987 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 15:13:09,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=615720.0, ans=0.1 2024-08-10 15:13:14,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=615820.0, ans=10.0 2024-08-10 15:13:25,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=615820.0, ans=0.2 2024-08-10 15:13:25,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-10 15:13:26,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=615820.0, ans=0.125 2024-08-10 15:13:39,892 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.056e+01 2.884e+01 3.216e+01 3.547e+01 5.586e+01, threshold=6.432e+01, percent-clipped=0.0 2024-08-10 15:13:47,496 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 15:13:50,479 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 15:13:57,650 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 15:14:01,884 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 15:14:05,856 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 15:14:11,288 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3650, loss[loss=0.124, beats_loss=0.00893, ecapa_loss=0.0002409, whisper_loss=0.1127, over 18474.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01179, ecapa_loss=0.0002439, whisper_loss=0.09612, over 3839388.31 frames. ], batch size: 71, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:14:28,610 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-10 15:14:28,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=616320.0, ans=0.2 2024-08-10 15:14:33,102 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-10 15:14:38,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616420.0, ans=0.1 2024-08-10 15:14:45,856 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 15:15:08,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=616620.0, ans=0.125 2024-08-10 15:15:16,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=616620.0, ans=0.0 2024-08-10 15:15:23,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3700, loss[loss=0.09168, beats_loss=0.01399, ecapa_loss=0.0001672, whisper_loss=0.07602, over 22424.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01184, ecapa_loss=0.0002432, whisper_loss=0.0958, over 3847691.19 frames. ], batch size: 85, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:15:28,052 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 15:15:30,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=616720.0, ans=0.2 2024-08-10 15:15:59,774 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 15:16:04,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=616920.0, ans=0.0 2024-08-10 15:16:05,221 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.736e+01 3.079e+01 3.558e+01 5.544e+01, threshold=6.157e+01, percent-clipped=0.0 2024-08-10 15:16:13,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=617020.0, ans=0.0 2024-08-10 15:16:31,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=617120.0, ans=0.0 2024-08-10 15:16:37,471 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3750, loss[loss=0.1221, beats_loss=0.01304, ecapa_loss=0.0002105, whisper_loss=0.107, over 15039.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01192, ecapa_loss=0.0002428, whisper_loss=0.09537, over 3853996.95 frames. ], batch size: 58, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:16:40,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=617220.0, ans=0.0 2024-08-10 15:16:45,964 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 16 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 15:16:47,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=617220.0, ans=0.0 2024-08-10 15:17:12,904 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 22 from Vox, 15 fro AS 2024-08-10 15:17:15,350 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 15:17:30,654 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:17:34,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-08-10 15:17:35,350 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.103e-02 2024-08-10 15:17:37,108 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=12.0 2024-08-10 15:17:43,944 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.65 vs. limit=22.5 2024-08-10 15:17:49,458 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3800, loss[loss=0.1057, beats_loss=0.01429, ecapa_loss=0.0002203, whisper_loss=0.0892, over 19513.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01198, ecapa_loss=0.0002429, whisper_loss=0.09489, over 3824124.35 frames. ], batch size: 77, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:17:51,552 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 15:18:17,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=617920.0, ans=0.125 2024-08-10 15:18:30,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.844e+01 3.115e+01 3.732e+01 5.922e+01, threshold=6.230e+01, percent-clipped=0.0 2024-08-10 15:18:32,480 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 24 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-10 15:18:38,305 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 15:18:51,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=618120.0, ans=0.125 2024-08-10 15:19:02,655 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3850, loss[loss=0.126, beats_loss=0.01122, ecapa_loss=0.0002323, whisper_loss=0.1125, over 21233.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01205, ecapa_loss=0.0002421, whisper_loss=0.09429, over 3823801.24 frames. ], batch size: 79, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:19:03,800 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2024-08-10 15:19:05,170 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 15:19:07,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=618220.0, ans=0.125 2024-08-10 15:19:14,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=618220.0, ans=0.125 2024-08-10 15:19:26,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=618320.0, ans=0.125 2024-08-10 15:19:44,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-10 15:19:56,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-08-10 15:20:07,197 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 15:20:07,812 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=12.0 2024-08-10 15:20:21,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=618620.0, ans=0.125 2024-08-10 15:20:31,601 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3900, loss[loss=0.0827, beats_loss=0.01527, ecapa_loss=0.0002262, whisper_loss=0.06517, over 14910.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01206, ecapa_loss=0.0002449, whisper_loss=0.09464, over 3831824.61 frames. ], batch size: 60, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:20:32,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=618720.0, ans=0.0 2024-08-10 15:20:41,441 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 15:21:05,945 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 15:21:23,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=618920.0, ans=0.0 2024-08-10 15:21:24,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.513e+01 3.061e+01 3.504e+01 4.098e+01 1.751e+02, threshold=7.008e+01, percent-clipped=3.0 2024-08-10 15:21:30,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619020.0, ans=0.1 2024-08-10 15:21:32,269 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2024-08-10 15:21:55,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=619120.0, ans=0.0 2024-08-10 15:22:11,695 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 3950, loss[loss=0.1116, beats_loss=0.01155, ecapa_loss=0.0002295, whisper_loss=0.09777, over 20722.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01201, ecapa_loss=0.0002452, whisper_loss=0.09504, over 3882175.92 frames. ], batch size: 82, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:23:11,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=619520.0, ans=0.02 2024-08-10 15:23:11,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=619520.0, ans=0.2 2024-08-10 15:23:29,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=619520.0, ans=0.125 2024-08-10 15:23:54,584 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4000, loss[loss=0.09533, beats_loss=0.0126, ecapa_loss=0.0002404, whisper_loss=0.08032, over 23202.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01194, ecapa_loss=0.0002464, whisper_loss=0.09485, over 3872063.51 frames. ], batch size: 94, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:23:57,697 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-10 15:24:23,191 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 15:24:27,979 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 15:24:38,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=619820.0, ans=0.025 2024-08-10 15:25:02,572 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 2.861e+01 3.318e+01 3.884e+01 5.554e+01, threshold=6.636e+01, percent-clipped=0.0 2024-08-10 15:25:26,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=620020.0, ans=0.125 2024-08-10 15:25:33,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=620120.0, ans=10.0 2024-08-10 15:25:35,525 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 15:25:44,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=620120.0, ans=0.1 2024-08-10 15:25:45,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=620120.0, ans=0.125 2024-08-10 15:25:52,351 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4050, loss[loss=0.1311, beats_loss=0.009015, ecapa_loss=0.0002618, whisper_loss=0.1194, over 23899.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01187, ecapa_loss=0.0002467, whisper_loss=0.09547, over 3881326.91 frames. ], batch size: 88, lr: 1.27e-02, grad_scale: 68719476736.0 2024-08-10 15:26:00,618 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=12.0 2024-08-10 15:26:19,990 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 15:26:24,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=620320.0, ans=0.035 2024-08-10 15:26:27,756 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2024-08-10 15:26:30,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620320.0, ans=0.1 2024-08-10 15:26:37,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=620320.0, ans=0.1 2024-08-10 15:26:46,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=620420.0, ans=0.125 2024-08-10 15:26:48,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=620420.0, ans=0.025 2024-08-10 15:27:25,879 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 15:27:30,906 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 15:27:50,471 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4100, loss[loss=0.1077, beats_loss=0.01069, ecapa_loss=0.000248, whisper_loss=0.09454, over 13752.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01186, ecapa_loss=0.0002452, whisper_loss=0.09559, over 3892032.92 frames. ], batch size: 56, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:27:59,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=620720.0, ans=0.1 2024-08-10 15:28:13,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=620820.0, ans=0.125 2024-08-10 15:28:23,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=620820.0, ans=0.035 2024-08-10 15:28:39,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620920.0, ans=0.1 2024-08-10 15:28:59,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.458e+01 2.982e+01 3.358e+01 3.918e+01 5.492e+01, threshold=6.716e+01, percent-clipped=0.0 2024-08-10 15:29:00,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=620920.0, ans=0.125 2024-08-10 15:29:14,858 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 9 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 15:29:34,322 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4150, loss[loss=0.1211, beats_loss=0.009823, ecapa_loss=0.0002861, whisper_loss=0.1085, over 22720.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01189, ecapa_loss=0.0002429, whisper_loss=0.09599, over 3906306.59 frames. ], batch size: 90, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:29:34,695 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-10 15:30:08,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=621420.0, ans=0.0 2024-08-10 15:30:13,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=621420.0, ans=0.125 2024-08-10 15:30:17,195 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 15:30:27,486 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 15:30:30,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=621520.0, ans=0.125 2024-08-10 15:30:48,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621720.0, ans=0.1 2024-08-10 15:30:49,024 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4200, loss[loss=0.1018, beats_loss=0.01195, ecapa_loss=0.0002101, whisper_loss=0.08776, over 18186.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01185, ecapa_loss=0.0002426, whisper_loss=0.09635, over 3884363.92 frames. ], batch size: 71, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:30:53,963 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2024-08-10 15:31:02,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621820.0, ans=0.1 2024-08-10 15:31:31,221 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.809e+01 3.141e+01 3.651e+01 6.704e+01, threshold=6.282e+01, percent-clipped=0.0 2024-08-10 15:31:40,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622020.0, ans=0.1 2024-08-10 15:31:51,909 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2024-08-10 15:31:52,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622120.0, ans=0.1 2024-08-10 15:32:04,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=622220.0, ans=0.0 2024-08-10 15:32:05,482 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4250, loss[loss=0.1019, beats_loss=0.0102, ecapa_loss=0.0003017, whisper_loss=0.08865, over 18235.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01182, ecapa_loss=0.0002443, whisper_loss=0.09646, over 3880033.11 frames. ], batch size: 76, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:32:20,476 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 15:32:21,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=622320.0, ans=15.0 2024-08-10 15:32:28,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622320.0, ans=0.1 2024-08-10 15:32:35,163 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-10 15:32:38,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=622420.0, ans=0.125 2024-08-10 15:32:39,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=622420.0, ans=0.0 2024-08-10 15:32:42,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=622420.0, ans=0.2 2024-08-10 15:32:44,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=622420.0, ans=0.0 2024-08-10 15:32:50,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622520.0, ans=0.1 2024-08-10 15:33:02,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=622520.0, ans=0.2 2024-08-10 15:33:06,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=622620.0, ans=15.0 2024-08-10 15:33:19,214 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4300, loss[loss=0.08643, beats_loss=0.00827, ecapa_loss=0.0003072, whisper_loss=0.07509, over 13820.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01179, ecapa_loss=0.0002444, whisper_loss=0.09628, over 3860575.35 frames. ], batch size: 57, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:33:32,521 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 15:33:36,973 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 15:33:38,316 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 15:33:39,239 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.95 vs. limit=10.0 2024-08-10 15:33:40,277 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2024-08-10 15:33:59,923 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.798e+01 3.084e+01 3.774e+01 7.124e+01, threshold=6.168e+01, percent-clipped=2.0 2024-08-10 15:34:04,651 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-08-10 15:34:07,314 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 15:34:08,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=623020.0, ans=0.0 2024-08-10 15:34:12,667 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 15:34:28,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=623120.0, ans=0.035 2024-08-10 15:34:30,487 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4350, loss[loss=0.1149, beats_loss=0.01486, ecapa_loss=0.0002733, whisper_loss=0.09735, over 21623.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01183, ecapa_loss=0.0002454, whisper_loss=0.09562, over 3847820.55 frames. ], batch size: 92, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:34:40,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=623220.0, ans=0.0 2024-08-10 15:34:53,277 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-10 15:35:09,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=623420.0, ans=0.125 2024-08-10 15:35:20,475 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-10 15:35:40,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=623620.0, ans=0.125 2024-08-10 15:35:48,980 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.77 vs. limit=22.5 2024-08-10 15:35:50,729 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4400, loss[loss=0.1036, beats_loss=0.01268, ecapa_loss=0.0002055, whisper_loss=0.08883, over 18766.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.0118, ecapa_loss=0.0002454, whisper_loss=0.09582, over 3846091.39 frames. ], batch size: 73, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:35:51,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=623720.0, ans=0.125 2024-08-10 15:35:56,390 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=12.0 2024-08-10 15:35:58,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-10 15:36:11,765 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:36:13,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=623820.0, ans=0.1 2024-08-10 15:36:35,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=623920.0, ans=0.0 2024-08-10 15:36:38,447 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.993e+01 3.424e+01 4.007e+01 6.509e+01, threshold=6.848e+01, percent-clipped=2.0 2024-08-10 15:36:50,611 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=12.0 2024-08-10 15:37:00,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=624120.0, ans=0.1 2024-08-10 15:37:04,278 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 15:37:15,565 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4450, loss[loss=0.1127, beats_loss=0.01192, ecapa_loss=0.0002393, whisper_loss=0.09836, over 23305.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01181, ecapa_loss=0.0002438, whisper_loss=0.09603, over 3858286.55 frames. ], batch size: 93, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:37:21,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=624220.0, ans=0.2 2024-08-10 15:37:23,217 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 15:37:32,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=624320.0, ans=0.0 2024-08-10 15:37:40,050 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 15:38:08,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=624520.0, ans=0.125 2024-08-10 15:38:16,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=624520.0, ans=0.1 2024-08-10 15:38:37,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=624720.0, ans=0.125 2024-08-10 15:38:39,161 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4500, loss[loss=0.09873, beats_loss=0.0122, ecapa_loss=0.0002768, whisper_loss=0.08376, over 17132.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01183, ecapa_loss=0.0002435, whisper_loss=0.09516, over 3835073.87 frames. ], batch size: 74, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:38:47,783 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-10 15:38:48,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=624720.0, ans=0.125 2024-08-10 15:38:51,990 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2024-08-10 15:38:53,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=624720.0, ans=0.125 2024-08-10 15:38:58,179 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 15:39:27,356 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 2.908e+01 3.221e+01 3.849e+01 6.109e+01, threshold=6.442e+01, percent-clipped=0.0 2024-08-10 15:39:27,673 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-10 15:39:40,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=625020.0, ans=0.0 2024-08-10 15:39:49,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.75 vs. limit=22.5 2024-08-10 15:39:52,302 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 15:39:56,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625120.0, ans=0.1 2024-08-10 15:40:05,142 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4550, loss[loss=0.1231, beats_loss=0.01135, ecapa_loss=0.000312, whisper_loss=0.1086, over 16136.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01194, ecapa_loss=0.0002413, whisper_loss=0.09505, over 3870451.53 frames. ], batch size: 68, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:40:16,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=625220.0, ans=0.125 2024-08-10 15:40:23,088 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 15:40:27,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=625320.0, ans=0.015 2024-08-10 15:40:47,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=625420.0, ans=0.125 2024-08-10 15:41:11,936 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-10 15:41:12,210 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:41:13,544 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 15:41:17,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=625620.0, ans=0.0 2024-08-10 15:41:22,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625720.0, ans=0.1 2024-08-10 15:41:23,322 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4600, loss[loss=0.0788, beats_loss=0.01248, ecapa_loss=0.0002312, whisper_loss=0.06402, over 14661.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01198, ecapa_loss=0.0002393, whisper_loss=0.0945, over 3884958.64 frames. ], batch size: 57, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:41:25,059 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:41:27,726 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 15:41:36,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=625720.0, ans=0.125 2024-08-10 15:41:59,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625920.0, ans=0.1 2024-08-10 15:42:07,343 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.652e+01 3.147e+01 3.453e+01 6.048e+01, threshold=6.293e+01, percent-clipped=0.0 2024-08-10 15:42:26,980 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 18 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-10 15:42:34,964 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.27 vs. limit=22.5 2024-08-10 15:42:42,295 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4650, loss[loss=0.1477, beats_loss=0.01043, ecapa_loss=0.0002098, whisper_loss=0.1352, over 17095.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01185, ecapa_loss=0.0002419, whisper_loss=0.0948, over 3884063.78 frames. ], batch size: 63, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:42:44,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626220.0, ans=0.1 2024-08-10 15:42:45,856 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-10 15:42:50,451 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 15:42:50,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=626220.0, ans=0.125 2024-08-10 15:43:11,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=626320.0, ans=0.2 2024-08-10 15:43:50,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=626620.0, ans=0.125 2024-08-10 15:43:53,045 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 15:43:55,708 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-08-10 15:43:57,776 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 38 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-10 15:43:57,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=626620.0, ans=0.0 2024-08-10 15:44:03,826 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4700, loss[loss=0.1102, beats_loss=0.01192, ecapa_loss=0.0002222, whisper_loss=0.09603, over 22872.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01177, ecapa_loss=0.0002413, whisper_loss=0.09562, over 3888879.01 frames. ], batch size: 89, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:44:04,014 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 21 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 15:44:09,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=626720.0, ans=0.0 2024-08-10 15:44:11,161 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=15.0 2024-08-10 15:44:16,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626720.0, ans=0.1 2024-08-10 15:44:16,761 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2024-08-10 15:44:28,973 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 15:44:31,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=626820.0, ans=0.0 2024-08-10 15:44:48,744 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.667e+01 3.139e+01 3.783e+01 7.574e+01, threshold=6.278e+01, percent-clipped=1.0 2024-08-10 15:45:04,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=15.0 2024-08-10 15:45:11,442 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 15:45:13,707 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-08-10 15:45:24,989 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4750, loss[loss=0.1074, beats_loss=0.01038, ecapa_loss=0.0001969, whisper_loss=0.09503, over 16004.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01187, ecapa_loss=0.000242, whisper_loss=0.09473, over 3896945.85 frames. ], batch size: 59, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:45:30,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=627220.0, ans=0.0 2024-08-10 15:45:32,522 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.71 vs. limit=10.0 2024-08-10 15:46:08,807 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 15:46:10,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=627420.0, ans=0.025 2024-08-10 15:46:12,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=627420.0, ans=0.1 2024-08-10 15:46:31,101 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 15:46:38,550 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 15:46:43,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=627620.0, ans=0.2 2024-08-10 15:46:46,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=627720.0, ans=0.2 2024-08-10 15:46:47,501 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4800, loss[loss=0.1257, beats_loss=0.01069, ecapa_loss=0.0002466, whisper_loss=0.1126, over 15138.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.0119, ecapa_loss=0.0002442, whisper_loss=0.09455, over 3884751.74 frames. ], batch size: 58, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:47:09,044 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 15:47:35,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2024-08-10 15:47:35,604 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.946e+01 3.351e+01 4.117e+01 7.010e+01, threshold=6.703e+01, percent-clipped=2.0 2024-08-10 15:48:06,890 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.48 vs. limit=15.0 2024-08-10 15:48:09,596 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 15:48:11,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628220.0, ans=0.1 2024-08-10 15:48:12,491 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4850, loss[loss=0.09567, beats_loss=0.01513, ecapa_loss=0.0002382, whisper_loss=0.07816, over 13833.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.012, ecapa_loss=0.000243, whisper_loss=0.09467, over 3883794.21 frames. ], batch size: 56, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:48:24,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=628220.0, ans=0.07 2024-08-10 15:48:30,102 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-10 15:49:00,158 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:49:01,875 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.014e-01 2024-08-10 15:49:06,880 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.498e-01 2024-08-10 15:49:22,525 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.13 vs. limit=22.5 2024-08-10 15:49:25,360 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 15:49:35,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=628720.0, ans=0.125 2024-08-10 15:49:35,864 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4900, loss[loss=0.1075, beats_loss=0.01473, ecapa_loss=0.0001949, whisper_loss=0.09078, over 20849.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01201, ecapa_loss=0.0002416, whisper_loss=0.09485, over 3885716.45 frames. ], batch size: 84, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:49:58,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=628820.0, ans=0.0 2024-08-10 15:50:06,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=628920.0, ans=0.125 2024-08-10 15:50:19,706 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.794e+01 3.081e+01 3.669e+01 6.406e+01, threshold=6.163e+01, percent-clipped=0.0 2024-08-10 15:50:42,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=629120.0, ans=0.125 2024-08-10 15:50:44,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=629120.0, ans=0.0 2024-08-10 15:50:44,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=629120.0, ans=0.1 2024-08-10 15:50:54,933 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 4950, loss[loss=0.1295, beats_loss=0.01015, ecapa_loss=0.0002366, whisper_loss=0.117, over 23414.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01198, ecapa_loss=0.0002412, whisper_loss=0.0946, over 3866243.46 frames. ], batch size: 90, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:50:55,064 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 15:51:13,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=629320.0, ans=0.125 2024-08-10 15:51:19,173 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.18 vs. limit=10.0 2024-08-10 15:51:25,331 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 15:51:41,797 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 15:52:05,902 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 15:52:09,334 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 27 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 15:52:15,807 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5000, loss[loss=0.09449, beats_loss=0.01307, ecapa_loss=0.0002243, whisper_loss=0.07918, over 17655.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01193, ecapa_loss=0.0002418, whisper_loss=0.0949, over 3851052.74 frames. ], batch size: 71, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:52:23,012 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-10 15:52:34,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=629820.0, ans=0.1 2024-08-10 15:52:35,662 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.941e-01 2024-08-10 15:52:36,034 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2024-08-10 15:52:36,529 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 11 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 15:53:04,081 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.298e+01 2.939e+01 3.385e+01 3.961e+01 1.332e+02, threshold=6.770e+01, percent-clipped=1.0 2024-08-10 15:53:07,744 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2024-08-10 15:53:13,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=630020.0, ans=0.125 2024-08-10 15:53:35,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=630120.0, ans=0.125 2024-08-10 15:53:37,159 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5050, loss[loss=0.0947, beats_loss=0.01403, ecapa_loss=0.0002052, whisper_loss=0.07862, over 13914.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01201, ecapa_loss=0.0002413, whisper_loss=0.09468, over 3873030.11 frames. ], batch size: 56, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:53:49,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=630220.0, ans=0.2 2024-08-10 15:53:58,046 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 15:54:05,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=630320.0, ans=0.125 2024-08-10 15:54:36,197 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 15:54:43,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=630620.0, ans=0.2 2024-08-10 15:54:59,620 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5100, loss[loss=0.1145, beats_loss=0.008166, ecapa_loss=0.000278, whisper_loss=0.1035, over 14219.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01192, ecapa_loss=0.0002423, whisper_loss=0.09511, over 3839078.36 frames. ], batch size: 56, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:55:03,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=630720.0, ans=0.125 2024-08-10 15:55:06,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=630720.0, ans=0.125 2024-08-10 15:55:20,944 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2024-08-10 15:55:30,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=630820.0, ans=0.1 2024-08-10 15:55:44,834 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.958e+01 3.434e+01 3.932e+01 6.642e+01, threshold=6.868e+01, percent-clipped=0.0 2024-08-10 15:55:48,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=631020.0, ans=0.125 2024-08-10 15:55:51,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=631020.0, ans=0.125 2024-08-10 15:56:20,716 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5150, loss[loss=0.1087, beats_loss=0.01218, ecapa_loss=0.0002209, whisper_loss=0.09432, over 22117.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01193, ecapa_loss=0.0002409, whisper_loss=0.09551, over 3877411.38 frames. ], batch size: 87, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:56:53,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=631420.0, ans=0.1 2024-08-10 15:57:04,508 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 15:57:11,411 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.669e-01 2024-08-10 15:57:24,440 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 15:57:31,498 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 15:57:37,778 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5200, loss[loss=0.1178, beats_loss=0.01019, ecapa_loss=0.0002427, whisper_loss=0.1052, over 18119.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01188, ecapa_loss=0.0002421, whisper_loss=0.0961, over 3869891.86 frames. ], batch size: 68, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:57:39,359 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 15:57:47,165 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-10 15:57:51,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=631820.0, ans=0.125 2024-08-10 15:57:55,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=631820.0, ans=0.125 2024-08-10 15:57:55,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=631820.0, ans=0.125 2024-08-10 15:58:07,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=631920.0, ans=0.1 2024-08-10 15:58:19,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.945e+01 3.443e+01 4.072e+01 7.195e+01, threshold=6.886e+01, percent-clipped=1.0 2024-08-10 15:58:36,259 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-10 15:58:45,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=632120.0, ans=0.1 2024-08-10 15:58:51,612 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5250, loss[loss=0.1138, beats_loss=0.01284, ecapa_loss=0.000283, whisper_loss=0.09809, over 22233.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01186, ecapa_loss=0.0002441, whisper_loss=0.09563, over 3878653.77 frames. ], batch size: 93, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:59:10,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632320.0, ans=0.1 2024-08-10 15:59:11,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=632320.0, ans=0.0 2024-08-10 15:59:13,815 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=12.0 2024-08-10 15:59:21,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.0 2024-08-10 15:59:22,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=632420.0, ans=0.125 2024-08-10 15:59:45,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=632520.0, ans=0.2 2024-08-10 16:00:06,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=632720.0, ans=0.2 2024-08-10 16:00:07,019 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5300, loss[loss=0.112, beats_loss=0.01135, ecapa_loss=0.0002869, whisper_loss=0.09774, over 21515.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.0118, ecapa_loss=0.0002434, whisper_loss=0.09552, over 3860698.43 frames. ], batch size: 90, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:00:16,430 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 16:00:18,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=632720.0, ans=0.125 2024-08-10 16:00:19,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=632820.0, ans=0.125 2024-08-10 16:00:33,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=632820.0, ans=0.0 2024-08-10 16:00:34,442 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 16:00:44,526 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:00:47,200 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:00:47,846 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.840e+01 3.204e+01 3.763e+01 6.547e+01, threshold=6.407e+01, percent-clipped=0.0 2024-08-10 16:00:50,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=633020.0, ans=0.2 2024-08-10 16:01:09,358 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 16:01:12,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=633120.0, ans=0.125 2024-08-10 16:01:13,933 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2024-08-10 16:01:18,645 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5350, loss[loss=0.09783, beats_loss=0.01078, ecapa_loss=0.0002544, whisper_loss=0.08451, over 14631.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.0118, ecapa_loss=0.0002415, whisper_loss=0.09525, over 3871796.96 frames. ], batch size: 55, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:01:21,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=633220.0, ans=0.0 2024-08-10 16:01:22,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=633220.0, ans=0.125 2024-08-10 16:01:39,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=633320.0, ans=0.1 2024-08-10 16:02:11,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633520.0, ans=0.1 2024-08-10 16:02:21,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=633620.0, ans=0.1 2024-08-10 16:02:28,556 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5400, loss[loss=0.1058, beats_loss=0.0105, ecapa_loss=0.0002465, whisper_loss=0.09285, over 20052.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01178, ecapa_loss=0.0002402, whisper_loss=0.09586, over 3886762.13 frames. ], batch size: 80, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:02:28,727 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-10 16:02:30,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=633720.0, ans=0.125 2024-08-10 16:02:32,702 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 16:02:35,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=633720.0, ans=0.0 2024-08-10 16:02:48,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=633820.0, ans=0.125 2024-08-10 16:02:55,264 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 16:03:07,446 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.970e+01 3.287e+01 3.858e+01 5.350e+01, threshold=6.575e+01, percent-clipped=0.0 2024-08-10 16:03:22,750 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-10 16:03:24,081 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 16:03:24,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634120.0, ans=0.1 2024-08-10 16:03:37,458 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5450, loss[loss=0.1067, beats_loss=0.01028, ecapa_loss=0.0002605, whisper_loss=0.09381, over 21446.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01177, ecapa_loss=0.000241, whisper_loss=0.09533, over 3883453.55 frames. ], batch size: 89, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:03:41,620 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 16:03:47,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=634220.0, ans=0.125 2024-08-10 16:04:09,657 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 16:04:15,765 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 16:04:19,696 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 16:04:23,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=634520.0, ans=0.125 2024-08-10 16:04:35,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=634620.0, ans=0.125 2024-08-10 16:04:43,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=634720.0, ans=0.0 2024-08-10 16:04:44,557 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5500, loss[loss=0.1237, beats_loss=0.01104, ecapa_loss=0.0002963, whisper_loss=0.1097, over 20168.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01178, ecapa_loss=0.0002389, whisper_loss=0.09511, over 3861437.79 frames. ], batch size: 84, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:04:44,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=634720.0, ans=0.0 2024-08-10 16:04:47,216 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 27 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-10 16:04:52,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=634720.0, ans=0.125 2024-08-10 16:04:54,653 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 16:05:06,519 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 16:05:08,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=634820.0, ans=0.125 2024-08-10 16:05:11,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.20 vs. limit=10.0 2024-08-10 16:05:22,111 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.955e+01 3.201e+01 3.849e+01 6.033e+01, threshold=6.402e+01, percent-clipped=0.0 2024-08-10 16:05:25,357 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-10 16:05:47,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=635120.0, ans=0.125 2024-08-10 16:05:47,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=635120.0, ans=0.5 2024-08-10 16:05:48,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=635120.0, ans=0.125 2024-08-10 16:05:52,866 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5550, loss[loss=0.09646, beats_loss=0.01469, ecapa_loss=0.0002741, whisper_loss=0.07902, over 15967.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01177, ecapa_loss=0.0002392, whisper_loss=0.09512, over 3854915.17 frames. ], batch size: 70, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:05:57,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635220.0, ans=0.1 2024-08-10 16:06:26,229 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 16:06:29,722 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 16:06:31,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=635520.0, ans=0.0 2024-08-10 16:06:38,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2024-08-10 16:06:42,015 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2024-08-10 16:06:49,712 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 16:06:52,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=635620.0, ans=0.125 2024-08-10 16:06:55,194 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.463e-02 2024-08-10 16:06:58,655 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5600, loss[loss=0.08511, beats_loss=0.0143, ecapa_loss=0.0002421, whisper_loss=0.06839, over 16124.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01185, ecapa_loss=0.0002373, whisper_loss=0.09476, over 3869410.11 frames. ], batch size: 67, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:06:59,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=635720.0, ans=0.125 2024-08-10 16:07:05,173 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=15.0 2024-08-10 16:07:17,352 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 16:07:19,861 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 16:07:21,066 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 16:07:21,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=635820.0, ans=0.1 2024-08-10 16:07:25,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635920.0, ans=0.1 2024-08-10 16:07:35,373 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.708e+01 3.041e+01 3.496e+01 5.299e+01, threshold=6.081e+01, percent-clipped=0.0 2024-08-10 16:07:39,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=636020.0, ans=0.1 2024-08-10 16:07:44,303 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2024-08-10 16:07:50,233 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 16:08:04,670 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5650, loss[loss=0.08874, beats_loss=0.01232, ecapa_loss=0.0002889, whisper_loss=0.07353, over 21676.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01186, ecapa_loss=0.0002386, whisper_loss=0.09514, over 3873233.34 frames. ], batch size: 93, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:08:07,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=636220.0, ans=0.07 2024-08-10 16:08:25,292 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2024-08-10 16:08:27,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=636320.0, ans=0.125 2024-08-10 16:08:39,210 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-10 16:08:46,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=636520.0, ans=0.1 2024-08-10 16:08:50,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=636520.0, ans=0.1 2024-08-10 16:08:52,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=636520.0, ans=0.125 2024-08-10 16:08:54,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=636520.0, ans=0.09899494936611666 2024-08-10 16:08:55,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=636520.0, ans=0.2 2024-08-10 16:08:59,304 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 16:08:59,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=636620.0, ans=0.125 2024-08-10 16:09:10,482 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5700, loss[loss=0.1253, beats_loss=0.01051, ecapa_loss=0.000236, whisper_loss=0.1124, over 22826.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01183, ecapa_loss=0.0002404, whisper_loss=0.09584, over 3894545.26 frames. ], batch size: 89, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:09:10,607 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 16:09:19,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=636720.0, ans=0.015 2024-08-10 16:09:38,619 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 16:09:46,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=636920.0, ans=0.04949747468305833 2024-08-10 16:09:48,305 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 2.928e+01 3.301e+01 4.183e+01 7.157e+01, threshold=6.602e+01, percent-clipped=2.0 2024-08-10 16:09:50,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=637020.0, ans=0.1 2024-08-10 16:09:57,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=637020.0, ans=0.125 2024-08-10 16:10:16,557 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 15 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 16:10:19,256 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5750, loss[loss=0.08313, beats_loss=0.01366, ecapa_loss=0.0002671, whisper_loss=0.0668, over 13367.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01186, ecapa_loss=0.0002406, whisper_loss=0.0957, over 3908657.81 frames. ], batch size: 57, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:10:22,011 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 16:10:22,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=637220.0, ans=0.125 2024-08-10 16:10:24,817 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 16:10:27,961 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 25 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-10 16:10:29,873 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-08-10 16:10:33,457 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 16:10:33,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=637320.0, ans=0.125 2024-08-10 16:10:33,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=637320.0, ans=0.2 2024-08-10 16:10:39,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=637320.0, ans=0.125 2024-08-10 16:10:46,871 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.475e-01 2024-08-10 16:10:52,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=637420.0, ans=0.0 2024-08-10 16:10:55,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=637420.0, ans=0.0 2024-08-10 16:11:11,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=637520.0, ans=0.2 2024-08-10 16:11:15,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=637620.0, ans=0.07 2024-08-10 16:11:22,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=637620.0, ans=0.125 2024-08-10 16:11:28,428 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5800, loss[loss=0.1127, beats_loss=0.01036, ecapa_loss=0.0002684, whisper_loss=0.09969, over 21667.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01187, ecapa_loss=0.0002398, whisper_loss=0.09567, over 3876510.88 frames. ], batch size: 88, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:11:30,553 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.139e-01 2024-08-10 16:11:37,962 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.719e-02 2024-08-10 16:11:47,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=637820.0, ans=0.0 2024-08-10 16:12:02,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=637920.0, ans=0.125 2024-08-10 16:12:07,949 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.715e+01 3.192e+01 3.464e+01 4.938e+01, threshold=6.385e+01, percent-clipped=0.0 2024-08-10 16:12:12,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=638020.0, ans=0.0 2024-08-10 16:12:16,972 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2024-08-10 16:12:38,572 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5850, loss[loss=0.1246, beats_loss=0.01211, ecapa_loss=0.0002285, whisper_loss=0.1102, over 23958.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01187, ecapa_loss=0.0002404, whisper_loss=0.09468, over 3855233.28 frames. ], batch size: 93, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:12:58,169 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 16:13:13,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=638420.0, ans=0.125 2024-08-10 16:13:18,915 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 16:13:34,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638620.0, ans=0.1 2024-08-10 16:13:38,698 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 16:13:48,161 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5900, loss[loss=0.1235, beats_loss=0.01007, ecapa_loss=0.0002084, whisper_loss=0.1113, over 22532.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01191, ecapa_loss=0.0002421, whisper_loss=0.09371, over 3830994.39 frames. ], batch size: 86, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:13:50,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=638720.0, ans=0.0 2024-08-10 16:13:51,229 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 16:13:59,207 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 33 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 16:14:04,380 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.20 vs. limit=22.5 2024-08-10 16:14:06,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=638820.0, ans=0.125 2024-08-10 16:14:10,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=638820.0, ans=0.125 2024-08-10 16:14:17,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=638920.0, ans=0.035 2024-08-10 16:14:20,088 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.62 vs. limit=22.5 2024-08-10 16:14:22,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=638920.0, ans=0.0 2024-08-10 16:14:26,298 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.150e+01 2.959e+01 3.304e+01 3.845e+01 6.831e+01, threshold=6.608e+01, percent-clipped=1.0 2024-08-10 16:14:28,788 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=25.15 vs. limit=15.0 2024-08-10 16:14:34,949 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-10 16:14:39,746 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2024-08-10 16:14:46,073 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 16:14:55,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=639220.0, ans=0.0 2024-08-10 16:14:55,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639220.0, ans=0.1 2024-08-10 16:14:55,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=639220.0, ans=0.0 2024-08-10 16:14:56,606 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 5950, loss[loss=0.102, beats_loss=0.01295, ecapa_loss=0.0002773, whisper_loss=0.08624, over 19895.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01197, ecapa_loss=0.000242, whisper_loss=0.09334, over 3823490.57 frames. ], batch size: 84, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:14:56,808 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 16:15:05,024 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 16:15:32,483 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.90 vs. limit=6.0 2024-08-10 16:15:37,375 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 16:16:07,862 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6000, loss[loss=0.09094, beats_loss=0.01578, ecapa_loss=0.0001707, whisper_loss=0.07345, over 22213.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01196, ecapa_loss=0.0002404, whisper_loss=0.09384, over 3847846.84 frames. ], batch size: 90, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:16:07,863 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 16:16:49,365 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on ASR_libri: loss=0.2642, beats_loss=0, ecapa_loss=0.0007414, whisper_loss=0.2567, over 922467.00 frames. 2024-08-10 16:17:05,218 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2989, 4.7383, 4.8575, 5.2204], device='cuda:1') 2024-08-10 16:17:08,394 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on SV_voxceleb1: loss=0.006164, beats_loss=0, ecapa_loss=0.0006164, whisper_loss=0, over 939242.00 frames. 2024-08-10 16:19:02,520 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on AT_audioset: loss=0.02682, beats_loss=0.02682, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 16:19:02,525 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 16:19:11,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639720.0, ans=0.1 2024-08-10 16:19:12,657 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 16:19:39,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=639920.0, ans=0.2 2024-08-10 16:19:44,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=639920.0, ans=0.125 2024-08-10 16:19:45,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.869e+01 3.209e+01 3.631e+01 6.157e+01, threshold=6.418e+01, percent-clipped=0.0 2024-08-10 16:19:47,560 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2024-08-10 16:20:02,101 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 16:20:08,081 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.98 vs. limit=10.0 2024-08-10 16:20:13,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=640120.0, ans=0.125 2024-08-10 16:20:15,504 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6050, loss[loss=0.1096, beats_loss=0.01361, ecapa_loss=0.0001961, whisper_loss=0.09403, over 20563.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01177, ecapa_loss=0.0002411, whisper_loss=0.09448, over 3804049.23 frames. ], batch size: 79, lr: 1.25e-02, grad_scale: 137438953472.0 2024-08-10 16:20:21,466 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 16:20:30,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=640320.0, ans=0.1 2024-08-10 16:20:34,141 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 16:20:35,430 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 16:20:42,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=640420.0, ans=0.125 2024-08-10 16:20:51,544 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.04 vs. limit=22.5 2024-08-10 16:21:12,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=640520.0, ans=0.1 2024-08-10 16:21:14,921 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 16:21:32,213 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6100, loss[loss=0.1195, beats_loss=0.0106, ecapa_loss=0.0002755, whisper_loss=0.1061, over 21487.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01174, ecapa_loss=0.0002424, whisper_loss=0.0951, over 3837570.12 frames. ], batch size: 87, lr: 1.25e-02, grad_scale: 137438953472.0 2024-08-10 16:21:32,625 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 16:21:37,135 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 16:22:05,566 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 24 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-10 16:22:15,578 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 3.045e+01 3.489e+01 4.204e+01 8.442e+01, threshold=6.977e+01, percent-clipped=4.0 2024-08-10 16:22:21,598 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.32 vs. limit=22.5 2024-08-10 16:22:29,274 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-10 16:22:47,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=641220.0, ans=0.125 2024-08-10 16:22:48,123 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6150, loss[loss=0.09482, beats_loss=0.01163, ecapa_loss=0.0002203, whisper_loss=0.08098, over 17369.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01181, ecapa_loss=0.0002424, whisper_loss=0.09488, over 3839157.00 frames. ], batch size: 67, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:22:48,313 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 16:22:51,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=641220.0, ans=0.0 2024-08-10 16:22:51,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=641220.0, ans=0.125 2024-08-10 16:22:54,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=641220.0, ans=0.125 2024-08-10 16:23:06,428 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-10 16:23:09,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=641320.0, ans=0.07 2024-08-10 16:23:10,668 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 16:23:12,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=641320.0, ans=0.0 2024-08-10 16:23:19,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641420.0, ans=0.1 2024-08-10 16:23:23,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=641420.0, ans=0.1 2024-08-10 16:23:26,773 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 16:23:40,050 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 16:24:03,850 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6200, loss[loss=0.1135, beats_loss=0.01081, ecapa_loss=0.0002425, whisper_loss=0.1002, over 19145.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01178, ecapa_loss=0.0002444, whisper_loss=0.09421, over 3823053.94 frames. ], batch size: 77, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:24:26,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=641820.0, ans=0.125 2024-08-10 16:24:31,829 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 10 from Vox, 38 fro AS 2024-08-10 16:24:34,541 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 16:24:39,835 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 16:24:42,621 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.777e+01 3.185e+01 3.780e+01 9.777e+01, threshold=6.369e+01, percent-clipped=1.0 2024-08-10 16:24:43,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=641920.0, ans=0.0 2024-08-10 16:24:52,935 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 16:24:54,299 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 16:25:00,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=642120.0, ans=0.0 2024-08-10 16:25:11,379 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 16:25:16,329 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6250, loss[loss=0.09796, beats_loss=0.01405, ecapa_loss=0.0002115, whisper_loss=0.08179, over 19824.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01175, ecapa_loss=0.0002439, whisper_loss=0.09501, over 3850303.83 frames. ], batch size: 80, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:25:16,442 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-10 16:25:36,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=642320.0, ans=0.2 2024-08-10 16:25:39,493 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 16:25:52,025 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 16:25:54,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=642420.0, ans=0.0 2024-08-10 16:26:08,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=642520.0, ans=0.2 2024-08-10 16:26:22,963 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2024-08-10 16:26:29,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=642620.0, ans=0.0 2024-08-10 16:26:31,265 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6300, loss[loss=0.1127, beats_loss=0.01272, ecapa_loss=0.0002709, whisper_loss=0.09722, over 12864.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01176, ecapa_loss=0.0002426, whisper_loss=0.09539, over 3848560.00 frames. ], batch size: 54, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:26:40,974 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 16:26:41,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=642720.0, ans=0.0 2024-08-10 16:27:08,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=642920.0, ans=0.0 2024-08-10 16:27:14,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.923e+01 3.266e+01 3.609e+01 6.240e+01, threshold=6.531e+01, percent-clipped=0.0 2024-08-10 16:27:16,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=643020.0, ans=0.125 2024-08-10 16:27:22,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.17 vs. limit=22.5 2024-08-10 16:27:33,554 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 16:27:37,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=643120.0, ans=0.0 2024-08-10 16:27:45,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=643220.0, ans=0.2 2024-08-10 16:27:46,132 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6350, loss[loss=0.122, beats_loss=0.01147, ecapa_loss=0.000234, whisper_loss=0.1082, over 21603.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01166, ecapa_loss=0.0002433, whisper_loss=0.09614, over 3817506.69 frames. ], batch size: 83, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:27:54,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=643220.0, ans=0.125 2024-08-10 16:27:55,988 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 16:28:05,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=643320.0, ans=0.125 2024-08-10 16:28:24,418 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-10 16:28:38,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=643520.0, ans=0.0 2024-08-10 16:28:49,200 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 16:28:55,431 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 11 from Vox, 46 fro AS 2024-08-10 16:28:57,873 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6400, loss[loss=0.09386, beats_loss=0.0141, ecapa_loss=0.0002175, whisper_loss=0.07758, over 20732.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.0118, ecapa_loss=0.0002403, whisper_loss=0.09565, over 3850831.68 frames. ], batch size: 84, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:29:04,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=643720.0, ans=0.125 2024-08-10 16:29:28,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=643920.0, ans=0.1 2024-08-10 16:29:35,737 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 16:29:36,872 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 2.802e+01 3.219e+01 3.654e+01 6.592e+01, threshold=6.438e+01, percent-clipped=1.0 2024-08-10 16:29:37,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=643920.0, ans=0.0 2024-08-10 16:29:43,313 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.174e+00 2024-08-10 16:29:45,454 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 16:29:59,097 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 16:30:01,964 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 16:30:03,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=644120.0, ans=0.0 2024-08-10 16:30:07,041 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6450, loss[loss=0.1147, beats_loss=0.009689, ecapa_loss=0.0002467, whisper_loss=0.1026, over 18129.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01185, ecapa_loss=0.0002399, whisper_loss=0.09518, over 3862480.03 frames. ], batch size: 72, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:30:07,233 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 16:30:11,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=644220.0, ans=0.125 2024-08-10 16:30:13,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=644220.0, ans=0.125 2024-08-10 16:30:33,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=644420.0, ans=0.125 2024-08-10 16:30:34,250 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2024-08-10 16:30:46,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=644520.0, ans=0.125 2024-08-10 16:31:02,412 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 16:31:03,940 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 16:31:06,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=644620.0, ans=0.125 2024-08-10 16:31:08,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=644620.0, ans=0.0 2024-08-10 16:31:14,633 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6500, loss[loss=0.1057, beats_loss=0.01254, ecapa_loss=0.0002009, whisper_loss=0.09115, over 17827.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01185, ecapa_loss=0.0002394, whisper_loss=0.09608, over 3886942.48 frames. ], batch size: 72, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:31:44,337 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 16:31:45,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=644920.0, ans=0.1 2024-08-10 16:31:51,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=644920.0, ans=0.04949747468305833 2024-08-10 16:31:53,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 3.031e+01 3.251e+01 3.712e+01 6.418e+01, threshold=6.501e+01, percent-clipped=0.0 2024-08-10 16:32:04,063 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 16:32:15,070 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-10 16:32:22,687 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.675e-03 2024-08-10 16:32:23,444 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6550, loss[loss=0.1111, beats_loss=0.01324, ecapa_loss=0.0002793, whisper_loss=0.09512, over 18007.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01191, ecapa_loss=0.0002398, whisper_loss=0.09605, over 3885431.87 frames. ], batch size: 73, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:32:28,527 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.19 vs. limit=15.0 2024-08-10 16:32:33,480 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 16:32:34,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.60 vs. limit=15.0 2024-08-10 16:32:36,374 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 16:32:44,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=645320.0, ans=0.125 2024-08-10 16:32:46,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=645320.0, ans=0.0 2024-08-10 16:32:50,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.87 vs. limit=15.0 2024-08-10 16:33:00,690 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 16:33:02,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=645420.0, ans=0.125 2024-08-10 16:33:03,552 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 16:33:09,475 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 28 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 16:33:22,769 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 16:33:28,073 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:33:32,495 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6600, loss[loss=0.1164, beats_loss=0.01054, ecapa_loss=0.0002193, whisper_loss=0.1037, over 17410.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01188, ecapa_loss=0.0002418, whisper_loss=0.09613, over 3904951.60 frames. ], batch size: 66, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:33:49,369 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 16:34:05,551 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 16:34:11,267 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.864e+01 3.355e+01 3.985e+01 6.693e+01, threshold=6.710e+01, percent-clipped=1.0 2024-08-10 16:34:11,411 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 16:34:23,941 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.21 vs. limit=22.5 2024-08-10 16:34:41,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=646120.0, ans=0.0 2024-08-10 16:34:43,478 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6650, loss[loss=0.119, beats_loss=0.01046, ecapa_loss=0.0002245, whisper_loss=0.1063, over 17638.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01178, ecapa_loss=0.000243, whisper_loss=0.09631, over 3907960.98 frames. ], batch size: 67, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:34:52,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=646220.0, ans=0.0 2024-08-10 16:35:13,162 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 22 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-10 16:35:16,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=646420.0, ans=0.0 2024-08-10 16:35:17,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=646420.0, ans=0.2 2024-08-10 16:35:27,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=646520.0, ans=0.0 2024-08-10 16:35:28,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=646520.0, ans=0.2 2024-08-10 16:35:31,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=646520.0, ans=0.125 2024-08-10 16:35:38,756 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:35:44,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=646620.0, ans=0.07 2024-08-10 16:35:55,450 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6700, loss[loss=0.1232, beats_loss=0.01211, ecapa_loss=0.0001993, whisper_loss=0.1091, over 23316.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01181, ecapa_loss=0.0002417, whisper_loss=0.0957, over 3892977.04 frames. ], batch size: 89, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:35:56,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=646720.0, ans=0.0 2024-08-10 16:36:01,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=646720.0, ans=0.0 2024-08-10 16:36:03,763 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 22 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 16:36:04,126 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.465e-01 2024-08-10 16:36:10,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=646820.0, ans=0.0 2024-08-10 16:36:11,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=646820.0, ans=0.05 2024-08-10 16:36:15,276 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.97 vs. limit=6.0 2024-08-10 16:36:16,635 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.87 vs. limit=10.0 2024-08-10 16:36:17,816 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 16:36:19,651 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 16:36:25,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=646920.0, ans=0.0 2024-08-10 16:36:34,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.879e+01 3.180e+01 3.709e+01 5.171e+01, threshold=6.361e+01, percent-clipped=0.0 2024-08-10 16:36:38,940 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 16:36:40,403 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 16:37:05,434 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6750, loss[loss=0.1129, beats_loss=0.0152, ecapa_loss=0.0002325, whisper_loss=0.09541, over 22138.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01192, ecapa_loss=0.0002409, whisper_loss=0.09528, over 3896880.29 frames. ], batch size: 90, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:37:10,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=647220.0, ans=0.0 2024-08-10 16:37:14,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=647220.0, ans=0.125 2024-08-10 16:37:18,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=647320.0, ans=0.0 2024-08-10 16:37:18,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=647320.0, ans=0.125 2024-08-10 16:37:24,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647320.0, ans=0.1 2024-08-10 16:37:24,556 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:37:29,909 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 16:37:50,904 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.385e+03 2024-08-10 16:38:12,904 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6800, loss[loss=0.1071, beats_loss=0.01339, ecapa_loss=0.0002047, whisper_loss=0.09164, over 19978.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01185, ecapa_loss=0.000242, whisper_loss=0.09514, over 3899077.53 frames. ], batch size: 79, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:38:21,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=647720.0, ans=0.2 2024-08-10 16:38:23,536 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 13 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 16:38:30,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=647820.0, ans=0.125 2024-08-10 16:38:33,560 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 16:38:44,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=647920.0, ans=0.125 2024-08-10 16:38:44,157 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.588e-02 2024-08-10 16:38:53,178 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 2.892e+01 3.322e+01 4.059e+01 7.063e+01, threshold=6.643e+01, percent-clipped=1.0 2024-08-10 16:39:13,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=648120.0, ans=15.0 2024-08-10 16:39:22,814 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6850, loss[loss=0.135, beats_loss=0.008947, ecapa_loss=0.0002943, whisper_loss=0.1231, over 21431.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01185, ecapa_loss=0.0002403, whisper_loss=0.09472, over 3876507.47 frames. ], batch size: 89, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:39:27,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=648220.0, ans=0.04949747468305833 2024-08-10 16:39:30,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=648220.0, ans=0.2 2024-08-10 16:39:36,832 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 30 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 16:39:39,689 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 16:40:15,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=648520.0, ans=0.1 2024-08-10 16:40:21,696 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 16:40:23,215 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.346e-01 2024-08-10 16:40:28,615 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2024-08-10 16:40:31,809 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6900, loss[loss=0.07701, beats_loss=0.01464, ecapa_loss=0.0001884, whisper_loss=0.06049, over 17208.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01188, ecapa_loss=0.0002391, whisper_loss=0.09466, over 3858284.98 frames. ], batch size: 71, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:40:33,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2024-08-10 16:40:55,704 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 16:41:10,314 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.858e+01 3.304e+01 3.695e+01 5.634e+01, threshold=6.608e+01, percent-clipped=0.0 2024-08-10 16:41:16,330 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-08-10 16:41:20,171 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 32 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 16:41:24,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649020.0, ans=0.1 2024-08-10 16:41:26,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649120.0, ans=0.1 2024-08-10 16:41:32,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=649120.0, ans=0.0 2024-08-10 16:41:40,398 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 6950, loss[loss=0.1047, beats_loss=0.01225, ecapa_loss=0.0002398, whisper_loss=0.09006, over 18945.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01186, ecapa_loss=0.000238, whisper_loss=0.09494, over 3853894.46 frames. ], batch size: 79, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:41:47,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=649220.0, ans=0.2 2024-08-10 16:41:48,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=649220.0, ans=0.0 2024-08-10 16:42:02,235 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 16:42:05,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=649320.0, ans=0.125 2024-08-10 16:42:05,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=649320.0, ans=0.125 2024-08-10 16:42:20,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=649520.0, ans=0.1 2024-08-10 16:42:49,125 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7000, loss[loss=0.1243, beats_loss=0.01138, ecapa_loss=0.000261, whisper_loss=0.1104, over 15816.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01188, ecapa_loss=0.0002387, whisper_loss=0.0955, over 3860800.19 frames. ], batch size: 63, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:43:01,170 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 16:43:04,948 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 16:43:08,164 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.22 vs. limit=22.5 2024-08-10 16:43:25,663 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.800e+01 3.263e+01 3.998e+01 9.402e+01, threshold=6.527e+01, percent-clipped=1.0 2024-08-10 16:43:33,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=650020.0, ans=0.2 2024-08-10 16:43:40,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=650120.0, ans=0.125 2024-08-10 16:43:55,141 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7050, loss[loss=0.1199, beats_loss=0.01128, ecapa_loss=0.0001985, whisper_loss=0.1066, over 19858.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.012, ecapa_loss=0.0002376, whisper_loss=0.09487, over 3886200.75 frames. ], batch size: 77, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:43:59,680 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 16:44:01,159 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 16:44:06,543 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 16:44:09,214 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 16:44:26,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=650420.0, ans=0.0 2024-08-10 16:44:29,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=650420.0, ans=0.125 2024-08-10 16:44:37,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=650520.0, ans=0.0 2024-08-10 16:45:00,565 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-10 16:45:01,633 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7100, loss[loss=0.09866, beats_loss=0.009356, ecapa_loss=0.0003258, whisper_loss=0.08605, over 16576.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01187, ecapa_loss=0.0002387, whisper_loss=0.0952, over 3884565.59 frames. ], batch size: 69, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:45:06,623 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2024-08-10 16:45:11,746 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.60 vs. limit=22.5 2024-08-10 16:45:14,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=650820.0, ans=0.125 2024-08-10 16:45:15,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=650820.0, ans=0.125 2024-08-10 16:45:27,688 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 16:45:29,114 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 16:45:30,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=650920.0, ans=0.0 2024-08-10 16:45:36,879 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 16:45:39,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.645e+01 3.161e+01 3.535e+01 5.692e+01, threshold=6.321e+01, percent-clipped=0.0 2024-08-10 16:45:50,314 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 27 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 16:45:51,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=651020.0, ans=0.125 2024-08-10 16:46:02,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=651120.0, ans=0.125 2024-08-10 16:46:02,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=651120.0, ans=0.125 2024-08-10 16:46:03,278 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2024-08-10 16:46:07,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=651220.0, ans=0.0 2024-08-10 16:46:08,626 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7150, loss[loss=0.1047, beats_loss=0.01451, ecapa_loss=0.0002061, whisper_loss=0.08817, over 22623.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01191, ecapa_loss=0.0002393, whisper_loss=0.0949, over 3887925.28 frames. ], batch size: 90, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:46:10,104 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 16:46:26,775 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 16:46:30,024 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2024-08-10 16:46:41,655 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 16:46:44,938 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 15 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 16:46:53,958 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 16:47:05,650 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 35 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 16:47:08,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=651620.0, ans=0.0 2024-08-10 16:47:13,982 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-10 16:47:17,095 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7200, loss[loss=0.09735, beats_loss=0.01183, ecapa_loss=0.0002153, whisper_loss=0.08336, over 14827.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01187, ecapa_loss=0.0002386, whisper_loss=0.0951, over 3897624.69 frames. ], batch size: 58, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:47:50,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=651920.0, ans=0.125 2024-08-10 16:47:56,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.798e+01 3.257e+01 4.005e+01 1.167e+02, threshold=6.513e+01, percent-clipped=2.0 2024-08-10 16:48:00,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=652020.0, ans=0.0 2024-08-10 16:48:04,469 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-10 16:48:06,314 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-10 16:48:20,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=652120.0, ans=0.125 2024-08-10 16:48:26,570 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7250, loss[loss=0.1069, beats_loss=0.01008, ecapa_loss=0.0002617, whisper_loss=0.09419, over 15074.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01185, ecapa_loss=0.0002378, whisper_loss=0.09579, over 3903833.16 frames. ], batch size: 60, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:48:26,747 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 16:48:28,543 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 16:48:47,721 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.40 vs. limit=10.0 2024-08-10 16:49:13,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=652520.0, ans=0.125 2024-08-10 16:49:37,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=652720.0, ans=0.1 2024-08-10 16:49:38,480 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7300, loss[loss=0.1083, beats_loss=0.01159, ecapa_loss=0.0002763, whisper_loss=0.09393, over 14000.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01175, ecapa_loss=0.0002376, whisper_loss=0.09662, over 3873540.00 frames. ], batch size: 54, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:49:44,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=652720.0, ans=0.125 2024-08-10 16:49:53,116 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 16:50:01,824 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-10 16:50:12,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=652920.0, ans=0.0 2024-08-10 16:50:16,665 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.741e+01 3.048e+01 3.552e+01 4.958e+01, threshold=6.095e+01, percent-clipped=0.0 2024-08-10 16:50:18,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=653020.0, ans=0.125 2024-08-10 16:50:28,711 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=22.5 2024-08-10 16:50:38,967 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 27 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-10 16:50:39,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=653120.0, ans=0.0 2024-08-10 16:50:39,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=653120.0, ans=15.0 2024-08-10 16:50:41,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=653120.0, ans=0.0 2024-08-10 16:50:46,720 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7350, loss[loss=0.1181, beats_loss=0.008056, ecapa_loss=0.0003737, whisper_loss=0.1063, over 20427.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01176, ecapa_loss=0.0002382, whisper_loss=0.0955, over 3832814.67 frames. ], batch size: 87, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:51:20,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=653420.0, ans=0.05 2024-08-10 16:51:30,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=653520.0, ans=0.1 2024-08-10 16:51:37,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=653520.0, ans=0.125 2024-08-10 16:51:50,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=653620.0, ans=0.125 2024-08-10 16:51:51,677 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 16:51:57,020 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7400, loss[loss=0.107, beats_loss=0.01293, ecapa_loss=0.0002052, whisper_loss=0.09206, over 18782.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01175, ecapa_loss=0.0002378, whisper_loss=0.09604, over 3873037.09 frames. ], batch size: 75, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:51:57,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=653720.0, ans=0.1 2024-08-10 16:52:21,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=653820.0, ans=0.2 2024-08-10 16:52:35,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.867e+01 3.212e+01 3.714e+01 5.750e+01, threshold=6.424e+01, percent-clipped=0.0 2024-08-10 16:52:46,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=654020.0, ans=0.125 2024-08-10 16:52:48,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=654020.0, ans=0.125 2024-08-10 16:52:58,598 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-10 16:53:01,751 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.04 vs. limit=22.5 2024-08-10 16:53:04,951 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7450, loss[loss=0.1123, beats_loss=0.01097, ecapa_loss=0.0002445, whisper_loss=0.09887, over 19868.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01183, ecapa_loss=0.0002397, whisper_loss=0.09464, over 3874833.20 frames. ], batch size: 79, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:53:06,834 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-10 16:53:11,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=654220.0, ans=0.125 2024-08-10 16:53:15,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=654220.0, ans=0.125 2024-08-10 16:53:25,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=654320.0, ans=0.125 2024-08-10 16:53:26,386 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-10 16:53:42,081 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-08-10 16:53:42,944 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-10 16:53:47,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=654520.0, ans=0.2 2024-08-10 16:54:11,088 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7500, loss[loss=0.1232, beats_loss=0.01048, ecapa_loss=0.0002756, whisper_loss=0.1099, over 21483.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01186, ecapa_loss=0.0002378, whisper_loss=0.09541, over 3875255.09 frames. ], batch size: 86, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:54:15,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654720.0, ans=0.1 2024-08-10 16:54:22,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=654720.0, ans=0.125 2024-08-10 16:54:31,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=654820.0, ans=0.125 2024-08-10 16:54:33,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=654820.0, ans=0.05 2024-08-10 16:54:40,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=654920.0, ans=0.125 2024-08-10 16:54:44,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=654920.0, ans=0.125 2024-08-10 16:54:48,179 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.920e+01 3.227e+01 3.863e+01 6.212e+01, threshold=6.454e+01, percent-clipped=0.0 2024-08-10 16:54:52,630 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-10 16:54:55,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=655020.0, ans=0.0 2024-08-10 16:55:02,823 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 16:55:05,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=655120.0, ans=0.125 2024-08-10 16:55:17,252 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7550, loss[loss=0.08673, beats_loss=0.01269, ecapa_loss=0.000247, whisper_loss=0.07157, over 20788.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01186, ecapa_loss=0.0002385, whisper_loss=0.09514, over 3861651.40 frames. ], batch size: 86, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:55:18,954 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 32 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-10 16:55:24,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=655220.0, ans=0.125 2024-08-10 16:55:27,552 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2024-08-10 16:55:48,074 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 16:55:48,679 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.25 vs. limit=15.0 2024-08-10 16:55:52,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=655420.0, ans=0.125 2024-08-10 16:55:52,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=655420.0, ans=0.125 2024-08-10 16:56:22,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=655620.0, ans=0.2 2024-08-10 16:56:24,233 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7600, loss[loss=0.09325, beats_loss=0.01337, ecapa_loss=0.0002309, whisper_loss=0.07757, over 21835.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01178, ecapa_loss=0.0002386, whisper_loss=0.09576, over 3879186.84 frames. ], batch size: 91, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:56:29,744 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 16:56:35,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=655720.0, ans=0.125 2024-08-10 16:56:48,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=655820.0, ans=0.1 2024-08-10 16:57:02,724 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.834e+01 3.413e+01 3.883e+01 8.700e+01, threshold=6.826e+01, percent-clipped=1.0 2024-08-10 16:57:06,574 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 16:57:23,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=656120.0, ans=0.0 2024-08-10 16:57:31,962 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7650, loss[loss=0.1228, beats_loss=0.01192, ecapa_loss=0.0002014, whisper_loss=0.1089, over 24324.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01177, ecapa_loss=0.0002368, whisper_loss=0.09629, over 3906933.14 frames. ], batch size: 91, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:57:35,172 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.652e+00 2024-08-10 16:57:37,769 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=12.0 2024-08-10 16:57:53,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=656320.0, ans=0.125 2024-08-10 16:57:55,806 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 16:58:06,158 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 19 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-10 16:58:18,072 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-10 16:58:20,599 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 16:58:29,767 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 16:58:31,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=656620.0, ans=0.125 2024-08-10 16:58:35,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=656620.0, ans=0.125 2024-08-10 16:58:37,475 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7700, loss[loss=0.09163, beats_loss=0.01371, ecapa_loss=0.0001526, whisper_loss=0.0764, over 22702.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01179, ecapa_loss=0.0002371, whisper_loss=0.09541, over 3928650.28 frames. ], batch size: 88, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:59:15,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.911e+01 3.362e+01 3.849e+01 6.405e+01, threshold=6.723e+01, percent-clipped=0.0 2024-08-10 16:59:27,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=657020.0, ans=0.0 2024-08-10 16:59:37,297 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2024-08-10 16:59:44,378 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7750, loss[loss=0.1009, beats_loss=0.01404, ecapa_loss=0.0002222, whisper_loss=0.08461, over 21659.00 frames. ], tot_loss[loss=0.109, beats_loss=0.0119, ecapa_loss=0.0002351, whisper_loss=0.09472, over 3935300.19 frames. ], batch size: 90, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:59:58,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657320.0, ans=0.1 2024-08-10 17:00:16,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=657420.0, ans=0.2 2024-08-10 17:00:17,958 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 17:00:26,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=657520.0, ans=0.0 2024-08-10 17:00:30,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657520.0, ans=0.1 2024-08-10 17:00:31,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=657520.0, ans=0.125 2024-08-10 17:00:34,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=657520.0, ans=0.125 2024-08-10 17:00:50,131 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 17:00:51,420 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7800, loss[loss=0.1232, beats_loss=0.01202, ecapa_loss=0.0002292, whisper_loss=0.1088, over 22386.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.0119, ecapa_loss=0.0002358, whisper_loss=0.09498, over 3930538.43 frames. ], batch size: 90, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:00:59,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=657720.0, ans=0.0 2024-08-10 17:01:02,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=657720.0, ans=0.0 2024-08-10 17:01:21,850 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 16 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 17:01:23,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=657920.0, ans=0.125 2024-08-10 17:01:28,046 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.784e+01 3.058e+01 3.552e+01 6.431e+01, threshold=6.115e+01, percent-clipped=0.0 2024-08-10 17:01:35,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=658020.0, ans=0.1 2024-08-10 17:01:47,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=658120.0, ans=0.09899494936611666 2024-08-10 17:01:53,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=658120.0, ans=0.125 2024-08-10 17:01:57,333 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7850, loss[loss=0.1027, beats_loss=0.01005, ecapa_loss=0.0002051, whisper_loss=0.09058, over 17431.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01178, ecapa_loss=0.0002371, whisper_loss=0.09589, over 3917569.10 frames. ], batch size: 65, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:02:01,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=658220.0, ans=0.125 2024-08-10 17:02:08,632 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-10 17:02:33,839 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 17:02:46,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=658520.0, ans=0.1 2024-08-10 17:03:04,992 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7900, loss[loss=0.126, beats_loss=0.01089, ecapa_loss=0.0002446, whisper_loss=0.1127, over 17954.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01181, ecapa_loss=0.0002377, whisper_loss=0.09535, over 3899838.76 frames. ], batch size: 69, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:03:09,487 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 17:03:15,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=658720.0, ans=0.125 2024-08-10 17:03:30,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=658820.0, ans=0.0 2024-08-10 17:03:42,960 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.841e+01 3.204e+01 3.801e+01 5.785e+01, threshold=6.407e+01, percent-clipped=0.0 2024-08-10 17:03:44,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=659020.0, ans=0.125 2024-08-10 17:03:56,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659020.0, ans=0.1 2024-08-10 17:04:11,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659220.0, ans=0.1 2024-08-10 17:04:12,388 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 7950, loss[loss=0.09434, beats_loss=0.01171, ecapa_loss=0.0002561, whisper_loss=0.08007, over 20799.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01186, ecapa_loss=0.0002357, whisper_loss=0.09518, over 3900029.17 frames. ], batch size: 88, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:04:18,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=659220.0, ans=0.125 2024-08-10 17:04:32,556 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-10 17:04:49,105 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2024-08-10 17:04:53,660 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 17:04:55,077 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 17:04:56,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=659520.0, ans=0.04949747468305833 2024-08-10 17:05:04,631 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 17:05:07,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=659620.0, ans=0.0 2024-08-10 17:05:08,166 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-10 17:05:08,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=659620.0, ans=0.1 2024-08-10 17:05:09,390 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 17:05:09,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=659620.0, ans=0.125 2024-08-10 17:05:19,003 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8000, loss[loss=0.09592, beats_loss=0.01175, ecapa_loss=0.0002683, whisper_loss=0.08149, over 14057.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01187, ecapa_loss=0.0002352, whisper_loss=0.09502, over 3911403.26 frames. ], batch size: 58, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:05:37,523 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 17:05:38,199 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2024-08-10 17:05:49,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=659920.0, ans=0.0 2024-08-10 17:05:55,804 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.761e+01 3.157e+01 3.536e+01 5.933e+01, threshold=6.314e+01, percent-clipped=0.0 2024-08-10 17:05:59,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660020.0, ans=0.1 2024-08-10 17:06:00,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=660020.0, ans=0.07 2024-08-10 17:06:08,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=660020.0, ans=0.0 2024-08-10 17:06:14,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=660120.0, ans=0.125 2024-08-10 17:06:25,380 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8050, loss[loss=0.1133, beats_loss=0.0111, ecapa_loss=0.0002338, whisper_loss=0.09986, over 21430.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01176, ecapa_loss=0.0002355, whisper_loss=0.09538, over 3883440.47 frames. ], batch size: 85, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:07:13,899 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.654e+05 2024-08-10 17:07:21,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=660620.0, ans=0.5 2024-08-10 17:07:31,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=660720.0, ans=0.125 2024-08-10 17:07:32,449 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8100, loss[loss=0.1244, beats_loss=0.01061, ecapa_loss=0.0002496, whisper_loss=0.1113, over 23134.00 frames. ], tot_loss[loss=0.11, beats_loss=0.0117, ecapa_loss=0.000239, whisper_loss=0.0959, over 3858053.51 frames. ], batch size: 92, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:07:33,970 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 17:07:49,861 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 17:07:53,766 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 17:07:55,381 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2024-08-10 17:08:01,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=660920.0, ans=0.125 2024-08-10 17:08:03,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=660920.0, ans=0.125 2024-08-10 17:08:09,453 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 3.015e+01 3.254e+01 3.867e+01 1.141e+02, threshold=6.509e+01, percent-clipped=2.0 2024-08-10 17:08:29,420 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 17:08:38,793 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8150, loss[loss=0.09179, beats_loss=0.009947, ecapa_loss=0.0003199, whisper_loss=0.07865, over 21444.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01167, ecapa_loss=0.0002416, whisper_loss=0.0952, over 3851734.92 frames. ], batch size: 87, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:08:50,225 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.193e-01 2024-08-10 17:08:54,870 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.73 vs. limit=10.0 2024-08-10 17:08:56,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=661320.0, ans=0.0 2024-08-10 17:09:01,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=661320.0, ans=10.0 2024-08-10 17:09:10,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=661420.0, ans=0.125 2024-08-10 17:09:10,542 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2024-08-10 17:09:24,395 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-10 17:09:42,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-10 17:09:44,347 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 34 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 17:09:45,395 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8200, loss[loss=0.1411, beats_loss=0.008187, ecapa_loss=0.0002575, whisper_loss=0.1303, over 20477.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01159, ecapa_loss=0.0002414, whisper_loss=0.09608, over 3881501.59 frames. ], batch size: 76, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:09:55,886 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 17:09:56,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=661720.0, ans=0.0 2024-08-10 17:09:58,869 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.96 vs. limit=22.5 2024-08-10 17:10:21,903 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.877e+01 3.347e+01 3.681e+01 6.491e+01, threshold=6.694e+01, percent-clipped=0.0 2024-08-10 17:10:23,287 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 17:10:33,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=662020.0, ans=0.125 2024-08-10 17:10:35,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=662020.0, ans=0.125 2024-08-10 17:10:35,608 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2024-08-10 17:10:36,445 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 17:10:50,931 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8250, loss[loss=0.0941, beats_loss=0.01348, ecapa_loss=0.0002157, whisper_loss=0.07847, over 16902.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01175, ecapa_loss=0.0002383, whisper_loss=0.09508, over 3899065.03 frames. ], batch size: 69, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:10:58,568 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.28 vs. limit=15.0 2024-08-10 17:11:01,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=662220.0, ans=0.125 2024-08-10 17:11:02,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=662220.0, ans=0.0 2024-08-10 17:11:21,092 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 17:11:23,801 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 17:11:25,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=662420.0, ans=0.125 2024-08-10 17:11:32,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=662520.0, ans=0.0 2024-08-10 17:11:39,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=662520.0, ans=0.125 2024-08-10 17:11:43,360 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 17:11:47,356 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 16 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-10 17:11:49,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=662620.0, ans=0.0 2024-08-10 17:11:56,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.68 vs. limit=15.0 2024-08-10 17:11:57,485 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8300, loss[loss=0.1359, beats_loss=0.008363, ecapa_loss=0.0004226, whisper_loss=0.1233, over 14304.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01172, ecapa_loss=0.0002381, whisper_loss=0.095, over 3891576.64 frames. ], batch size: 62, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:12:27,390 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 17:12:30,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=662920.0, ans=0.125 2024-08-10 17:12:34,968 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.231e+01 2.908e+01 3.363e+01 4.143e+01 6.461e+01, threshold=6.726e+01, percent-clipped=0.0 2024-08-10 17:12:40,347 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 17:12:51,158 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 11 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-10 17:12:51,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=663120.0, ans=0.0 2024-08-10 17:12:56,048 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 31 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 17:13:04,066 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8350, loss[loss=0.09021, beats_loss=0.01514, ecapa_loss=0.0001657, whisper_loss=0.07341, over 14975.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01168, ecapa_loss=0.0002372, whisper_loss=0.09514, over 3866564.77 frames. ], batch size: 56, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:13:26,883 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-10 17:13:44,332 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-10 17:14:08,556 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 17:14:09,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=663620.0, ans=0.125 2024-08-10 17:14:15,993 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8400, loss[loss=0.1465, beats_loss=0.009855, ecapa_loss=0.0002015, whisper_loss=0.1346, over 24616.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01166, ecapa_loss=0.0002383, whisper_loss=0.09589, over 3880115.67 frames. ], batch size: 89, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:14:24,360 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2024-08-10 17:14:45,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=663920.0, ans=0.125 2024-08-10 17:14:49,054 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 17:14:49,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663920.0, ans=0.1 2024-08-10 17:14:56,196 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.836e+01 3.172e+01 3.671e+01 5.154e+01, threshold=6.343e+01, percent-clipped=0.0 2024-08-10 17:15:02,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=664020.0, ans=0.0 2024-08-10 17:15:10,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664020.0, ans=0.1 2024-08-10 17:15:20,738 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.56 vs. limit=22.5 2024-08-10 17:15:28,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=664220.0, ans=0.0 2024-08-10 17:15:29,253 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8450, loss[loss=0.1181, beats_loss=0.01164, ecapa_loss=0.0002648, whisper_loss=0.1038, over 22322.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01169, ecapa_loss=0.0002379, whisper_loss=0.09558, over 3886972.59 frames. ], batch size: 93, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:15:37,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2024-08-10 17:15:40,076 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-10 17:16:00,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=664420.0, ans=0.025 2024-08-10 17:16:06,178 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 31 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 17:16:30,485 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-10 17:16:42,526 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8500, loss[loss=0.107, beats_loss=0.01284, ecapa_loss=0.0002109, whisper_loss=0.09201, over 14263.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01172, ecapa_loss=0.0002381, whisper_loss=0.09531, over 3853945.60 frames. ], batch size: 56, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:16:43,070 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 17:16:58,700 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 17:17:17,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=664920.0, ans=0.125 2024-08-10 17:17:20,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664920.0, ans=0.1 2024-08-10 17:17:26,713 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.856e+01 3.264e+01 3.786e+01 7.141e+01, threshold=6.528e+01, percent-clipped=1.0 2024-08-10 17:17:32,711 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 19 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 17:18:00,055 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8550, loss[loss=0.09789, beats_loss=0.01406, ecapa_loss=0.0002617, whisper_loss=0.08121, over 18348.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01177, ecapa_loss=0.0002382, whisper_loss=0.09489, over 3834840.04 frames. ], batch size: 77, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:18:26,416 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 17:18:38,325 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 17:18:46,746 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2024-08-10 17:19:10,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=665620.0, ans=0.125 2024-08-10 17:19:16,598 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8600, loss[loss=0.09929, beats_loss=0.01179, ecapa_loss=0.0002898, whisper_loss=0.0846, over 13403.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01178, ecapa_loss=0.0002372, whisper_loss=0.09503, over 3854706.80 frames. ], batch size: 55, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:19:22,559 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2024-08-10 17:19:36,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=665820.0, ans=0.125 2024-08-10 17:20:05,147 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.897e+01 3.260e+01 3.635e+01 5.528e+01, threshold=6.520e+01, percent-clipped=0.0 2024-08-10 17:20:07,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=666020.0, ans=0.125 2024-08-10 17:20:13,223 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 17:20:25,599 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-10 17:20:33,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=666120.0, ans=0.2 2024-08-10 17:20:41,440 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8650, loss[loss=0.09913, beats_loss=0.01217, ecapa_loss=0.0002035, whisper_loss=0.08492, over 22795.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01177, ecapa_loss=0.0002387, whisper_loss=0.09483, over 3842453.48 frames. ], batch size: 89, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:20:45,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=666220.0, ans=0.1 2024-08-10 17:20:48,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=666220.0, ans=0.125 2024-08-10 17:21:10,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=666320.0, ans=0.125 2024-08-10 17:21:15,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=666420.0, ans=0.125 2024-08-10 17:21:53,945 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.02 vs. limit=15.0 2024-08-10 17:22:14,132 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8700, loss[loss=0.1184, beats_loss=0.0131, ecapa_loss=0.0002719, whisper_loss=0.1026, over 22144.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01177, ecapa_loss=0.0002387, whisper_loss=0.0951, over 3820781.65 frames. ], batch size: 94, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:22:41,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-08-10 17:22:53,519 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 17:23:16,584 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.246e+01 2.997e+01 3.503e+01 4.044e+01 1.535e+02, threshold=7.007e+01, percent-clipped=1.0 2024-08-10 17:23:30,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=667020.0, ans=0.0 2024-08-10 17:23:42,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=667120.0, ans=0.1 2024-08-10 17:23:44,734 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2024-08-10 17:23:56,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=667120.0, ans=0.125 2024-08-10 17:23:58,398 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8750, loss[loss=0.09767, beats_loss=0.01308, ecapa_loss=0.0001685, whisper_loss=0.0829, over 15888.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01167, ecapa_loss=0.0002398, whisper_loss=0.09541, over 3802742.85 frames. ], batch size: 58, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:24:03,102 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 17:24:44,588 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 17:25:04,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=667420.0, ans=0.125 2024-08-10 17:25:13,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=667520.0, ans=0.125 2024-08-10 17:25:22,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=667520.0, ans=0.0 2024-08-10 17:25:37,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=667620.0, ans=0.125 2024-08-10 17:25:57,735 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8800, loss[loss=0.1036, beats_loss=0.01411, ecapa_loss=0.0001853, whisper_loss=0.08761, over 23234.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01179, ecapa_loss=0.000238, whisper_loss=0.09549, over 3840189.54 frames. ], batch size: 91, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:26:28,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=667820.0, ans=0.2 2024-08-10 17:26:39,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=667820.0, ans=0.2 2024-08-10 17:26:51,606 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 17:27:04,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.842e+01 3.119e+01 3.570e+01 8.103e+01, threshold=6.239e+01, percent-clipped=1.0 2024-08-10 17:28:03,289 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8850, loss[loss=0.1153, beats_loss=0.009907, ecapa_loss=0.0002876, whisper_loss=0.1025, over 13798.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01176, ecapa_loss=0.0002366, whisper_loss=0.09542, over 3808252.49 frames. ], batch size: 56, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:28:19,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=668220.0, ans=0.125 2024-08-10 17:28:32,034 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.089e+05 2024-08-10 17:28:57,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=668420.0, ans=0.0 2024-08-10 17:29:13,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=668420.0, ans=0.025 2024-08-10 17:29:15,817 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 17:29:37,165 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 17:29:37,562 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 17:29:40,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.96 vs. limit=12.0 2024-08-10 17:29:47,935 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 17:29:53,700 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8900, loss[loss=0.1228, beats_loss=0.01029, ecapa_loss=0.0002819, whisper_loss=0.1097, over 22579.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01168, ecapa_loss=0.000236, whisper_loss=0.09704, over 3832748.93 frames. ], batch size: 89, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:30:01,973 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.06 vs. limit=12.0 2024-08-10 17:30:09,541 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 17:30:12,255 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 27 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-10 17:30:20,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=668820.0, ans=0.2 2024-08-10 17:30:36,281 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.696e+01 3.082e+01 3.587e+01 7.840e+01, threshold=6.164e+01, percent-clipped=1.0 2024-08-10 17:31:03,222 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 40 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 17:31:11,412 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 8950, loss[loss=0.102, beats_loss=0.01169, ecapa_loss=0.0002395, whisper_loss=0.08787, over 21261.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01164, ecapa_loss=0.0002357, whisper_loss=0.09726, over 3845159.69 frames. ], batch size: 90, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:31:18,358 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 17:31:19,630 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 13 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 17:31:23,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=669220.0, ans=0.035 2024-08-10 17:31:23,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2024-08-10 17:31:57,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=669520.0, ans=0.1 2024-08-10 17:32:09,219 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 17:32:27,586 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 17:32:28,345 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.15 vs. limit=15.0 2024-08-10 17:32:28,707 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9000, loss[loss=0.1145, beats_loss=0.01103, ecapa_loss=0.0002354, whisper_loss=0.1011, over 22532.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.0117, ecapa_loss=0.0002359, whisper_loss=0.09628, over 3823704.22 frames. ], batch size: 91, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:32:28,708 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 17:33:04,171 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on ASR_libri: loss=0.2625, beats_loss=0, ecapa_loss=0.0007367, whisper_loss=0.2551, over 922467.00 frames. 2024-08-10 17:33:18,171 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.1886, 1.8463, 1.6455, 0.7095, 0.9314, 1.4620, 1.7051, 1.6802], device='cuda:1') 2024-08-10 17:33:20,266 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on SV_voxceleb1: loss=0.006282, beats_loss=0, ecapa_loss=0.0006282, whisper_loss=0, over 939242.00 frames. 2024-08-10 17:35:05,155 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on AT_audioset: loss=0.02673, beats_loss=0.02673, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 17:35:05,158 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 17:35:14,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=669720.0, ans=0.125 2024-08-10 17:35:17,061 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 17:35:18,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=669820.0, ans=0.0 2024-08-10 17:35:41,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=669920.0, ans=0.035 2024-08-10 17:35:47,863 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.797e+01 3.113e+01 3.593e+01 8.640e+01, threshold=6.226e+01, percent-clipped=2.0 2024-08-10 17:35:57,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=670020.0, ans=0.125 2024-08-10 17:36:21,164 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9050, loss[loss=0.1089, beats_loss=0.01209, ecapa_loss=0.0002394, whisper_loss=0.09443, over 21848.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01173, ecapa_loss=0.000236, whisper_loss=0.09546, over 3832335.23 frames. ], batch size: 89, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:36:21,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=670220.0, ans=0.0 2024-08-10 17:37:12,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=670520.0, ans=0.125 2024-08-10 17:37:18,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=670520.0, ans=0.1 2024-08-10 17:37:19,295 INFO [train_multi_KD3.py:844] (1/4) A total of 98 cuts. 23 from LS+wenet, 29 from Vox, 46 fro AS 2024-08-10 17:37:21,693 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-10 17:37:27,303 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 17:37:35,524 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9100, loss[loss=0.1062, beats_loss=0.01009, ecapa_loss=0.0003008, whisper_loss=0.09307, over 18959.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01185, ecapa_loss=0.0002357, whisper_loss=0.09474, over 3852344.88 frames. ], batch size: 82, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:37:41,619 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-10 17:37:42,847 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 17:37:54,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=670820.0, ans=0.125 2024-08-10 17:37:55,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2024-08-10 17:37:58,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=670820.0, ans=0.0 2024-08-10 17:38:02,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=670820.0, ans=0.2 2024-08-10 17:38:08,277 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 17:38:08,832 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.84 vs. limit=22.5 2024-08-10 17:38:11,546 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-08-10 17:38:12,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=670920.0, ans=0.0 2024-08-10 17:38:16,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.910e+01 3.253e+01 3.723e+01 6.048e+01, threshold=6.507e+01, percent-clipped=0.0 2024-08-10 17:38:38,667 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-10 17:38:41,810 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 17:38:45,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2024-08-10 17:38:49,178 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9150, loss[loss=0.1026, beats_loss=0.0136, ecapa_loss=0.000254, whisper_loss=0.08649, over 19446.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01172, ecapa_loss=0.0002349, whisper_loss=0.0955, over 3878139.76 frames. ], batch size: 79, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:38:49,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=671220.0, ans=0.2 2024-08-10 17:39:00,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=671220.0, ans=0.0 2024-08-10 17:39:12,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=671320.0, ans=0.2 2024-08-10 17:39:41,316 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 17:39:56,296 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-10 17:40:09,794 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9200, loss[loss=0.07523, beats_loss=0.01181, ecapa_loss=0.0002963, whisper_loss=0.06045, over 14209.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01163, ecapa_loss=0.0002359, whisper_loss=0.0961, over 3896984.62 frames. ], batch size: 60, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:40:43,448 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 17:40:44,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=671920.0, ans=0.125 2024-08-10 17:40:45,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=671920.0, ans=0.2 2024-08-10 17:40:48,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=671920.0, ans=0.0 2024-08-10 17:40:49,239 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 17:40:53,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.732e+01 3.100e+01 3.483e+01 6.432e+01, threshold=6.200e+01, percent-clipped=0.0 2024-08-10 17:41:26,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=672220.0, ans=0.0 2024-08-10 17:41:27,121 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9250, loss[loss=0.123, beats_loss=0.01225, ecapa_loss=0.0002561, whisper_loss=0.1082, over 21827.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01162, ecapa_loss=0.0002369, whisper_loss=0.09587, over 3879586.07 frames. ], batch size: 87, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:41:28,500 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 20 from LS+wenet, 23 from Vox, 51 fro AS 2024-08-10 17:41:32,069 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-10 17:41:37,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=672220.0, ans=0.1 2024-08-10 17:41:47,717 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2024-08-10 17:42:05,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=672420.0, ans=0.125 2024-08-10 17:42:17,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=672520.0, ans=0.0 2024-08-10 17:42:18,064 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-10 17:42:30,772 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 17:42:41,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=672720.0, ans=0.0 2024-08-10 17:42:41,887 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-08-10 17:42:42,569 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9300, loss[loss=0.07772, beats_loss=0.01481, ecapa_loss=0.0002085, whisper_loss=0.06082, over 19529.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.0117, ecapa_loss=0.0002358, whisper_loss=0.09565, over 3876794.16 frames. ], batch size: 78, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:42:44,560 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 17:43:04,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=672820.0, ans=0.0 2024-08-10 17:43:13,553 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 17:43:13,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=672920.0, ans=0.0 2024-08-10 17:43:28,092 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.985e+01 3.331e+01 3.923e+01 7.099e+01, threshold=6.662e+01, percent-clipped=2.0 2024-08-10 17:43:30,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=673020.0, ans=0.125 2024-08-10 17:44:00,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=673120.0, ans=0.125 2024-08-10 17:44:05,416 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9350, loss[loss=0.08232, beats_loss=0.01173, ecapa_loss=0.0003021, whisper_loss=0.06757, over 15280.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01167, ecapa_loss=0.0002362, whisper_loss=0.09536, over 3864986.17 frames. ], batch size: 64, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:44:13,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=673220.0, ans=0.0 2024-08-10 17:44:21,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=673320.0, ans=0.125 2024-08-10 17:44:24,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=673320.0, ans=0.125 2024-08-10 17:44:50,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=673420.0, ans=0.0 2024-08-10 17:44:53,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=673520.0, ans=0.0 2024-08-10 17:45:19,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=673620.0, ans=0.125 2024-08-10 17:45:22,854 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9400, loss[loss=0.09105, beats_loss=0.01544, ecapa_loss=0.0001707, whisper_loss=0.0739, over 23451.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01167, ecapa_loss=0.0002371, whisper_loss=0.095, over 3872336.66 frames. ], batch size: 93, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:45:23,004 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 17:45:28,915 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 17:45:35,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=673720.0, ans=0.0 2024-08-10 17:46:05,062 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.795e+01 3.115e+01 3.725e+01 7.083e+01, threshold=6.231e+01, percent-clipped=1.0 2024-08-10 17:46:09,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=674020.0, ans=0.125 2024-08-10 17:46:12,514 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 17:46:14,234 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 17:46:24,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=674120.0, ans=0.125 2024-08-10 17:46:35,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=674120.0, ans=0.0 2024-08-10 17:46:36,830 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9450, loss[loss=0.1353, beats_loss=0.01097, ecapa_loss=0.0002467, whisper_loss=0.1218, over 23167.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01167, ecapa_loss=0.0002376, whisper_loss=0.09518, over 3870787.17 frames. ], batch size: 90, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:46:38,573 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 17:46:38,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=674220.0, ans=0.125 2024-08-10 17:46:53,286 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 17:46:56,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=674320.0, ans=0.125 2024-08-10 17:47:15,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=674420.0, ans=0.125 2024-08-10 17:47:38,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=674620.0, ans=0.0 2024-08-10 17:47:48,941 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9500, loss[loss=0.07223, beats_loss=0.01384, ecapa_loss=0.0002429, whisper_loss=0.05595, over 19193.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01172, ecapa_loss=0.0002388, whisper_loss=0.09513, over 3894603.01 frames. ], batch size: 79, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:47:52,611 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 17:47:56,059 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2024-08-10 17:48:05,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=674820.0, ans=0.05 2024-08-10 17:48:13,900 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 17:48:22,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=674920.0, ans=0.2 2024-08-10 17:48:32,130 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 17:48:32,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=674920.0, ans=0.04949747468305833 2024-08-10 17:48:33,270 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.877e+01 3.250e+01 3.723e+01 7.953e+01, threshold=6.499e+01, percent-clipped=3.0 2024-08-10 17:48:34,465 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-10 17:48:47,633 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 17:48:58,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=675120.0, ans=0.125 2024-08-10 17:49:05,576 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9550, loss[loss=0.1102, beats_loss=0.01181, ecapa_loss=0.0002867, whisper_loss=0.09555, over 22025.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01174, ecapa_loss=0.0002383, whisper_loss=0.09492, over 3888075.20 frames. ], batch size: 92, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:49:29,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=675320.0, ans=0.125 2024-08-10 17:49:45,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=675420.0, ans=0.0 2024-08-10 17:49:51,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=675520.0, ans=0.125 2024-08-10 17:49:55,713 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-10 17:49:59,223 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 17:50:03,581 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 17:50:08,521 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-08-10 17:50:20,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=675720.0, ans=0.0 2024-08-10 17:50:21,450 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9600, loss[loss=0.1045, beats_loss=0.01326, ecapa_loss=0.0002519, whisper_loss=0.08869, over 20708.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01165, ecapa_loss=0.0002383, whisper_loss=0.09555, over 3899827.23 frames. ], batch size: 88, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:50:48,399 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2024-08-10 17:50:55,841 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-10 17:50:58,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=675920.0, ans=0.125 2024-08-10 17:51:00,189 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=12.0 2024-08-10 17:51:02,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 2.698e+01 2.997e+01 3.348e+01 4.884e+01, threshold=5.995e+01, percent-clipped=0.0 2024-08-10 17:51:04,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=676020.0, ans=0.125 2024-08-10 17:51:10,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=676020.0, ans=0.2 2024-08-10 17:51:24,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=676120.0, ans=0.0 2024-08-10 17:51:32,450 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9650, loss[loss=0.09199, beats_loss=0.01231, ecapa_loss=0.0002506, whisper_loss=0.07717, over 21966.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01159, ecapa_loss=0.0002387, whisper_loss=0.09572, over 3890612.57 frames. ], batch size: 92, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:51:32,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=676220.0, ans=0.0 2024-08-10 17:51:55,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=676320.0, ans=0.0 2024-08-10 17:51:58,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=676320.0, ans=0.2 2024-08-10 17:52:20,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=676520.0, ans=0.125 2024-08-10 17:52:29,558 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 17:52:34,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676620.0, ans=0.1 2024-08-10 17:52:43,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=676620.0, ans=0.0 2024-08-10 17:52:44,144 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 17:52:45,188 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9700, loss[loss=0.1064, beats_loss=0.011, ecapa_loss=0.0002537, whisper_loss=0.0929, over 15089.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01164, ecapa_loss=0.0002387, whisper_loss=0.09548, over 3900705.53 frames. ], batch size: 61, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:52:45,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=676720.0, ans=0.04949747468305833 2024-08-10 17:53:05,106 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=12.0 2024-08-10 17:53:09,905 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 17:53:11,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=676820.0, ans=0.0 2024-08-10 17:53:22,881 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.645e-02 2024-08-10 17:53:27,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.851e+01 3.065e+01 3.509e+01 5.015e+01, threshold=6.131e+01, percent-clipped=0.0 2024-08-10 17:53:28,247 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2024-08-10 17:53:32,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=677020.0, ans=0.125 2024-08-10 17:53:44,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=677120.0, ans=0.125 2024-08-10 17:53:59,535 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9750, loss[loss=0.1142, beats_loss=0.01093, ecapa_loss=0.0001877, whisper_loss=0.1014, over 15120.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01167, ecapa_loss=0.0002373, whisper_loss=0.09532, over 3892073.30 frames. ], batch size: 55, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:54:07,297 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 17:54:16,741 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.21 vs. limit=6.0 2024-08-10 17:54:28,691 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-10 17:54:53,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=677520.0, ans=0.0 2024-08-10 17:54:58,552 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 17:55:06,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=677620.0, ans=0.2 2024-08-10 17:55:10,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=677620.0, ans=0.0 2024-08-10 17:55:12,948 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9800, loss[loss=0.1333, beats_loss=0.009071, ecapa_loss=0.0003196, whisper_loss=0.121, over 22502.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01168, ecapa_loss=0.0002361, whisper_loss=0.09543, over 3881571.22 frames. ], batch size: 91, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:55:17,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=677720.0, ans=0.0 2024-08-10 17:55:23,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=677720.0, ans=0.0 2024-08-10 17:55:24,913 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 17:55:28,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=677820.0, ans=0.125 2024-08-10 17:55:32,537 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-10 17:55:33,455 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.18 vs. limit=12.0 2024-08-10 17:55:40,058 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 17:55:41,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=677920.0, ans=0.125 2024-08-10 17:55:45,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=677920.0, ans=0.09899494936611666 2024-08-10 17:55:50,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=677920.0, ans=0.1 2024-08-10 17:55:53,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=677920.0, ans=0.1 2024-08-10 17:55:54,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.700e+01 3.065e+01 3.596e+01 6.450e+01, threshold=6.130e+01, percent-clipped=1.0 2024-08-10 17:56:07,779 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 17:56:11,000 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.32 vs. limit=22.5 2024-08-10 17:56:17,277 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 17:56:26,145 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9850, loss[loss=0.1508, beats_loss=0.008506, ecapa_loss=0.0002183, whisper_loss=0.1401, over 24842.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01178, ecapa_loss=0.0002354, whisper_loss=0.09484, over 3859203.59 frames. ], batch size: 91, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:56:38,489 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-10 17:57:04,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=678420.0, ans=0.2 2024-08-10 17:57:12,202 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 17:57:36,273 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2024-08-10 17:57:41,311 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9900, loss[loss=0.1172, beats_loss=0.01155, ecapa_loss=0.0002699, whisper_loss=0.103, over 16628.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01177, ecapa_loss=0.0002358, whisper_loss=0.09516, over 3870805.33 frames. ], batch size: 69, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:58:01,310 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.547e-01 2024-08-10 17:58:12,040 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 11 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 17:58:19,699 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.724e+01 3.027e+01 3.695e+01 5.994e+01, threshold=6.053e+01, percent-clipped=0.0 2024-08-10 17:58:29,350 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 17:58:33,962 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.81 vs. limit=15.0 2024-08-10 17:58:36,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=679120.0, ans=0.2 2024-08-10 17:58:46,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=679120.0, ans=0.1 2024-08-10 17:58:50,494 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 9950, loss[loss=0.1177, beats_loss=0.01474, ecapa_loss=0.0002127, whisper_loss=0.1009, over 23527.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01181, ecapa_loss=0.0002357, whisper_loss=0.09548, over 3870267.19 frames. ], batch size: 92, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:58:52,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=679220.0, ans=0.1 2024-08-10 17:58:58,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=679220.0, ans=0.125 2024-08-10 17:59:08,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=679320.0, ans=0.125 2024-08-10 17:59:17,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=679320.0, ans=0.125 2024-08-10 17:59:18,648 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 17:59:18,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=679420.0, ans=0.1 2024-08-10 17:59:28,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=679420.0, ans=0.125 2024-08-10 17:59:35,456 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 17:59:36,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=679520.0, ans=0.125 2024-08-10 17:59:52,433 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.63 vs. limit=15.0 2024-08-10 17:59:53,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=679620.0, ans=0.0 2024-08-10 18:00:04,316 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10000, loss[loss=0.1109, beats_loss=0.01297, ecapa_loss=0.0002452, whisper_loss=0.09544, over 21749.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01187, ecapa_loss=0.0002362, whisper_loss=0.09495, over 3880024.67 frames. ], batch size: 89, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 18:00:09,006 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 18 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-10 18:00:25,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=679820.0, ans=0.1 2024-08-10 18:00:29,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=679820.0, ans=0.125 2024-08-10 18:00:32,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=679920.0, ans=0.125 2024-08-10 18:00:35,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=679920.0, ans=0.05 2024-08-10 18:00:36,996 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.094e+00 2024-08-10 18:00:47,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.793e+01 3.118e+01 3.876e+01 5.816e+01, threshold=6.237e+01, percent-clipped=0.0 2024-08-10 18:00:53,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680020.0, ans=0.1 2024-08-10 18:00:54,152 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 18:01:05,665 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 38 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-10 18:01:16,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=680220.0, ans=0.125 2024-08-10 18:01:17,649 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10050, loss[loss=0.1106, beats_loss=0.009132, ecapa_loss=0.0003114, whisper_loss=0.09832, over 21222.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01185, ecapa_loss=0.0002349, whisper_loss=0.09517, over 3890216.19 frames. ], batch size: 90, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:01:35,371 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.90 vs. limit=15.0 2024-08-10 18:01:43,988 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-10 18:02:12,265 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-08-10 18:02:20,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=680620.0, ans=0.1 2024-08-10 18:02:30,652 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10100, loss[loss=0.1178, beats_loss=0.01049, ecapa_loss=0.0002666, whisper_loss=0.1047, over 21775.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.0118, ecapa_loss=0.0002363, whisper_loss=0.09494, over 3884715.62 frames. ], batch size: 92, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:02:35,245 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.084e-01 2024-08-10 18:02:42,440 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 18:02:43,207 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2024-08-10 18:02:44,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=680820.0, ans=0.125 2024-08-10 18:02:46,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=680820.0, ans=0.0 2024-08-10 18:02:53,059 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 18:02:57,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=680820.0, ans=0.1 2024-08-10 18:03:07,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680920.0, ans=0.1 2024-08-10 18:03:07,803 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.60 vs. limit=6.0 2024-08-10 18:03:08,872 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=12.0 2024-08-10 18:03:12,576 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+01 2.905e+01 3.182e+01 3.646e+01 5.979e+01, threshold=6.363e+01, percent-clipped=0.0 2024-08-10 18:03:12,812 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 18:03:14,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2024-08-10 18:03:19,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=681020.0, ans=0.0 2024-08-10 18:03:36,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681120.0, ans=0.1 2024-08-10 18:03:42,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=681120.0, ans=0.2 2024-08-10 18:03:42,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681120.0, ans=0.1 2024-08-10 18:03:48,113 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10150, loss[loss=0.116, beats_loss=0.01315, ecapa_loss=0.0002288, whisper_loss=0.1005, over 22495.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01182, ecapa_loss=0.0002378, whisper_loss=0.09457, over 3898700.16 frames. ], batch size: 94, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:03:50,038 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.905e+03 2024-08-10 18:03:54,618 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 18:04:37,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=681520.0, ans=0.125 2024-08-10 18:04:49,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=681520.0, ans=0.125 2024-08-10 18:04:58,425 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.81 vs. limit=15.0 2024-08-10 18:05:05,690 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 18:05:08,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=681720.0, ans=0.0 2024-08-10 18:05:09,523 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10200, loss[loss=0.09372, beats_loss=0.01319, ecapa_loss=0.0002754, whisper_loss=0.07778, over 21457.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01171, ecapa_loss=0.000238, whisper_loss=0.0949, over 3865587.72 frames. ], batch size: 91, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:05:16,443 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 18:05:19,380 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-10 18:05:30,204 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 18:05:45,698 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 18:05:54,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.166e+01 2.852e+01 3.122e+01 3.821e+01 7.643e+01, threshold=6.244e+01, percent-clipped=3.0 2024-08-10 18:06:01,622 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-10 18:06:11,062 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 18:06:12,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=682120.0, ans=0.0 2024-08-10 18:06:23,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=682120.0, ans=0.0 2024-08-10 18:06:28,074 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10250, loss[loss=0.1351, beats_loss=0.008783, ecapa_loss=0.0002323, whisper_loss=0.124, over 18224.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01172, ecapa_loss=0.0002356, whisper_loss=0.09502, over 3867371.82 frames. ], batch size: 70, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:06:37,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=682220.0, ans=0.035 2024-08-10 18:06:47,167 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2024-08-10 18:07:07,819 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.991e+00 2024-08-10 18:07:13,314 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 24 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 18:07:37,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=682620.0, ans=0.1 2024-08-10 18:07:43,670 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 18:07:46,384 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10300, loss[loss=0.09145, beats_loss=0.01358, ecapa_loss=0.0001907, whisper_loss=0.07597, over 19210.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.0117, ecapa_loss=0.000234, whisper_loss=0.09519, over 3881206.91 frames. ], batch size: 73, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:07:51,611 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 18:07:55,147 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-10 18:08:11,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=682820.0, ans=0.125 2024-08-10 18:08:19,576 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.89 vs. limit=15.0 2024-08-10 18:08:26,152 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 18:08:29,580 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.976e+01 3.282e+01 3.794e+01 5.948e+01, threshold=6.564e+01, percent-clipped=0.0 2024-08-10 18:08:34,601 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2024-08-10 18:08:40,429 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 18:08:44,816 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.92 vs. limit=15.0 2024-08-10 18:09:02,151 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10350, loss[loss=0.104, beats_loss=0.01363, ecapa_loss=0.0001976, whisper_loss=0.08844, over 20454.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01183, ecapa_loss=0.0002333, whisper_loss=0.09449, over 3896947.67 frames. ], batch size: 79, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:09:16,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=683320.0, ans=0.0 2024-08-10 18:09:25,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=683320.0, ans=0.035 2024-08-10 18:09:27,559 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 18:09:28,878 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 18:09:48,125 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 18:10:20,455 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10400, loss[loss=0.09473, beats_loss=0.0108, ecapa_loss=0.0002757, whisper_loss=0.08118, over 14155.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01187, ecapa_loss=0.0002334, whisper_loss=0.09404, over 3894441.57 frames. ], batch size: 57, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:10:39,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683820.0, ans=0.1 2024-08-10 18:10:47,338 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 18:10:55,780 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.51 vs. limit=22.5 2024-08-10 18:11:02,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.804e+01 3.184e+01 3.674e+01 7.007e+01, threshold=6.369e+01, percent-clipped=1.0 2024-08-10 18:11:15,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=684020.0, ans=0.125 2024-08-10 18:11:21,014 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 40 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 18:11:34,751 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10450, loss[loss=0.1098, beats_loss=0.01027, ecapa_loss=0.0002612, whisper_loss=0.09696, over 21134.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01183, ecapa_loss=0.0002339, whisper_loss=0.0945, over 3887604.27 frames. ], batch size: 89, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:11:47,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=684220.0, ans=0.0 2024-08-10 18:12:03,719 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2024-08-10 18:12:34,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=684520.0, ans=0.125 2024-08-10 18:12:34,391 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-10 18:12:50,968 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 18:12:54,230 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10500, loss[loss=0.131, beats_loss=0.011, ecapa_loss=0.0002113, whisper_loss=0.1179, over 22462.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01182, ecapa_loss=0.0002356, whisper_loss=0.09437, over 3886557.20 frames. ], batch size: 88, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:13:29,905 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 18:13:35,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.920e+01 3.144e+01 3.815e+01 6.100e+01, threshold=6.288e+01, percent-clipped=0.0 2024-08-10 18:13:45,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=685020.0, ans=0.0 2024-08-10 18:14:09,642 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10550, loss[loss=0.1206, beats_loss=0.01188, ecapa_loss=0.0002212, whisper_loss=0.1065, over 20551.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01185, ecapa_loss=0.0002366, whisper_loss=0.09371, over 3851819.25 frames. ], batch size: 79, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:14:17,306 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 18:14:19,029 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 18:14:24,309 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 18:14:49,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=685420.0, ans=15.0 2024-08-10 18:15:04,973 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 18:15:11,573 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 18:15:12,026 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-10 18:15:12,748 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 18:15:28,477 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=12.0 2024-08-10 18:15:28,777 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10600, loss[loss=0.09159, beats_loss=0.01251, ecapa_loss=0.0002186, whisper_loss=0.07689, over 17636.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01187, ecapa_loss=0.0002373, whisper_loss=0.09328, over 3862292.04 frames. ], batch size: 68, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:15:30,662 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-10 18:15:36,973 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 18:15:38,196 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-10 18:15:46,413 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-10 18:16:06,000 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 18:16:06,590 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.04 vs. limit=22.5 2024-08-10 18:16:12,069 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.764e+01 3.108e+01 3.489e+01 4.887e+01, threshold=6.215e+01, percent-clipped=0.0 2024-08-10 18:16:25,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=686020.0, ans=0.1 2024-08-10 18:16:27,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=686020.0, ans=0.1 2024-08-10 18:16:30,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=686120.0, ans=0.0 2024-08-10 18:16:46,472 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10650, loss[loss=0.1133, beats_loss=0.008545, ecapa_loss=0.0002603, whisper_loss=0.1022, over 23248.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01184, ecapa_loss=0.0002356, whisper_loss=0.09309, over 3818507.24 frames. ], batch size: 92, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:16:48,189 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 18:16:59,316 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 18:17:05,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=686320.0, ans=0.0 2024-08-10 18:17:12,396 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 18:17:13,418 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.81 vs. limit=15.0 2024-08-10 18:17:24,001 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 18:17:31,482 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 18:17:45,985 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 18:17:51,012 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 18:18:04,525 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10700, loss[loss=0.1174, beats_loss=0.01124, ecapa_loss=0.0002319, whisper_loss=0.1038, over 18598.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01183, ecapa_loss=0.000235, whisper_loss=0.09324, over 3836130.98 frames. ], batch size: 75, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:18:11,859 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 18:18:38,537 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 18:18:45,023 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2024-08-10 18:18:47,243 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.883e+01 3.231e+01 3.765e+01 5.379e+01, threshold=6.463e+01, percent-clipped=0.0 2024-08-10 18:18:54,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=687020.0, ans=0.0 2024-08-10 18:18:54,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=687020.0, ans=0.125 2024-08-10 18:18:56,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=687020.0, ans=0.0 2024-08-10 18:19:11,463 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 25 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 18:19:13,658 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-10 18:19:23,247 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10750, loss[loss=0.1244, beats_loss=0.01177, ecapa_loss=0.0002294, whisper_loss=0.1104, over 18050.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01169, ecapa_loss=0.0002359, whisper_loss=0.09519, over 3870145.76 frames. ], batch size: 69, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:19:33,494 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.76 vs. limit=22.5 2024-08-10 18:19:44,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=687320.0, ans=0.125 2024-08-10 18:19:45,201 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 18:20:05,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=687420.0, ans=0.125 2024-08-10 18:20:07,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=687420.0, ans=0.125 2024-08-10 18:20:09,718 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 18:20:15,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=687520.0, ans=0.125 2024-08-10 18:20:31,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=687620.0, ans=0.2 2024-08-10 18:20:40,703 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10800, loss[loss=0.07962, beats_loss=0.01244, ecapa_loss=0.0002185, whisper_loss=0.065, over 19614.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01174, ecapa_loss=0.0002348, whisper_loss=0.09531, over 3913719.26 frames. ], batch size: 79, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:20:55,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=687820.0, ans=0.125 2024-08-10 18:21:23,502 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+01 2.760e+01 3.130e+01 3.473e+01 5.037e+01, threshold=6.260e+01, percent-clipped=0.0 2024-08-10 18:21:34,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=688020.0, ans=0.125 2024-08-10 18:21:43,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=688120.0, ans=0.125 2024-08-10 18:21:57,336 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10850, loss[loss=0.1116, beats_loss=0.01221, ecapa_loss=0.0002204, whisper_loss=0.09719, over 21945.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01181, ecapa_loss=0.0002346, whisper_loss=0.09542, over 3904285.40 frames. ], batch size: 89, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:22:18,603 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 13 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 18:22:25,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=688320.0, ans=0.125 2024-08-10 18:22:39,574 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 18:22:41,106 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 18:22:48,978 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 18:23:15,029 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10900, loss[loss=0.1184, beats_loss=0.009313, ecapa_loss=0.0002572, whisper_loss=0.1065, over 19199.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01182, ecapa_loss=0.0002343, whisper_loss=0.09511, over 3928350.36 frames. ], batch size: 78, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:23:18,108 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-10 18:23:21,663 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-10 18:24:02,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+01 2.842e+01 3.313e+01 3.977e+01 6.808e+01, threshold=6.627e+01, percent-clipped=2.0 2024-08-10 18:24:02,970 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.0 2024-08-10 18:24:03,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=689020.0, ans=0.0 2024-08-10 18:24:15,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=689020.0, ans=0.1 2024-08-10 18:24:31,833 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.76 vs. limit=15.0 2024-08-10 18:24:36,707 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 10950, loss[loss=0.1181, beats_loss=0.0113, ecapa_loss=0.0002554, whisper_loss=0.1042, over 21084.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01176, ecapa_loss=0.0002346, whisper_loss=0.09508, over 3925531.17 frames. ], batch size: 88, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:24:38,951 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-08-10 18:24:56,945 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-08-10 18:24:59,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=689320.0, ans=0.125 2024-08-10 18:25:19,584 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2024-08-10 18:25:21,877 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.904e-01 2024-08-10 18:25:24,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=689520.0, ans=0.2 2024-08-10 18:25:32,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=689520.0, ans=0.125 2024-08-10 18:25:47,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=689620.0, ans=0.0 2024-08-10 18:25:55,026 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11000, loss[loss=0.1122, beats_loss=0.009545, ecapa_loss=0.0002823, whisper_loss=0.09978, over 20145.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01173, ecapa_loss=0.000236, whisper_loss=0.09531, over 3929760.21 frames. ], batch size: 84, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:26:03,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=689720.0, ans=0.125 2024-08-10 18:26:15,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=689820.0, ans=0.0 2024-08-10 18:26:17,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=689820.0, ans=0.2 2024-08-10 18:26:19,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=689820.0, ans=0.125 2024-08-10 18:26:34,514 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-10 18:26:41,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.276e+01 2.857e+01 3.230e+01 3.620e+01 6.298e+01, threshold=6.460e+01, percent-clipped=0.0 2024-08-10 18:26:48,263 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-08-10 18:26:59,883 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2024-08-10 18:27:06,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=690120.0, ans=0.0 2024-08-10 18:27:16,924 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11050, loss[loss=0.1354, beats_loss=0.009487, ecapa_loss=0.0001771, whisper_loss=0.1241, over 18581.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01164, ecapa_loss=0.0002372, whisper_loss=0.09596, over 3945700.49 frames. ], batch size: 66, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:27:17,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=690220.0, ans=0.1 2024-08-10 18:27:28,101 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 18:27:32,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=690220.0, ans=0.2 2024-08-10 18:27:38,406 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 18:27:39,991 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 10 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 18:27:42,798 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 18:27:51,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=690420.0, ans=0.0 2024-08-10 18:27:52,945 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 18:27:55,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.85 vs. limit=15.0 2024-08-10 18:28:05,819 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 11 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 18:28:36,875 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11100, loss[loss=0.1073, beats_loss=0.01094, ecapa_loss=0.0002353, whisper_loss=0.09404, over 17642.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01166, ecapa_loss=0.0002365, whisper_loss=0.09547, over 3913948.40 frames. ], batch size: 70, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:28:56,487 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 18:28:59,950 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2024-08-10 18:29:00,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=690820.0, ans=0.09899494936611666 2024-08-10 18:29:04,800 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 35 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 18:29:13,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=690920.0, ans=0.2 2024-08-10 18:29:18,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.710e+01 3.196e+01 3.800e+01 5.125e+01, threshold=6.392e+01, percent-clipped=0.0 2024-08-10 18:29:27,834 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 18:29:37,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=691120.0, ans=0.125 2024-08-10 18:29:54,921 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11150, loss[loss=0.1235, beats_loss=0.01118, ecapa_loss=0.0001882, whisper_loss=0.1105, over 18016.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01164, ecapa_loss=0.0002341, whisper_loss=0.09595, over 3915763.06 frames. ], batch size: 65, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:29:56,368 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 18:30:22,802 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 29 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-10 18:30:44,919 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-10 18:30:54,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=691520.0, ans=0.07 2024-08-10 18:30:58,662 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 18:31:14,083 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11200, loss[loss=0.1097, beats_loss=0.01231, ecapa_loss=0.0002102, whisper_loss=0.09524, over 17098.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01159, ecapa_loss=0.000234, whisper_loss=0.09664, over 3917045.09 frames. ], batch size: 66, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:31:14,292 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 27 from Vox, 18 fro AS 2024-08-10 18:31:25,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.00 vs. limit=22.5 2024-08-10 18:31:30,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=691820.0, ans=0.0 2024-08-10 18:31:31,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=691820.0, ans=0.1 2024-08-10 18:31:56,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 2.779e+01 3.196e+01 3.588e+01 6.419e+01, threshold=6.392e+01, percent-clipped=1.0 2024-08-10 18:32:00,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=692020.0, ans=0.0 2024-08-10 18:32:01,956 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 18:32:06,715 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 18:32:12,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=692020.0, ans=0.2 2024-08-10 18:32:23,399 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 30 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 18:32:31,825 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11250, loss[loss=0.1093, beats_loss=0.01156, ecapa_loss=0.0002762, whisper_loss=0.09494, over 15001.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01157, ecapa_loss=0.0002349, whisper_loss=0.09641, over 3902072.92 frames. ], batch size: 61, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:32:43,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=692220.0, ans=0.125 2024-08-10 18:32:44,787 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 18:32:48,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=692320.0, ans=0.2 2024-08-10 18:33:20,463 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-10 18:33:22,694 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-08-10 18:33:37,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=692620.0, ans=0.125 2024-08-10 18:33:51,089 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11300, loss[loss=0.1173, beats_loss=0.01011, ecapa_loss=0.0002254, whisper_loss=0.1049, over 20457.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01161, ecapa_loss=0.0002336, whisper_loss=0.09628, over 3915077.73 frames. ], batch size: 80, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:33:54,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=692720.0, ans=0.125 2024-08-10 18:34:07,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=692820.0, ans=0.0 2024-08-10 18:34:09,222 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 18:34:09,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=692820.0, ans=0.0 2024-08-10 18:34:09,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=692820.0, ans=0.2 2024-08-10 18:34:16,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=692820.0, ans=0.0 2024-08-10 18:34:31,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=692920.0, ans=0.125 2024-08-10 18:34:35,501 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.891e+01 3.346e+01 3.835e+01 5.621e+01, threshold=6.692e+01, percent-clipped=0.0 2024-08-10 18:34:52,911 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 18:35:05,084 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 12 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 18:35:09,109 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11350, loss[loss=0.1233, beats_loss=0.01112, ecapa_loss=0.0002791, whisper_loss=0.1094, over 21935.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01159, ecapa_loss=0.0002366, whisper_loss=0.09516, over 3939198.31 frames. ], batch size: 90, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:35:12,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=693220.0, ans=0.1 2024-08-10 18:35:16,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=693220.0, ans=0.125 2024-08-10 18:35:22,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=693220.0, ans=0.125 2024-08-10 18:35:45,084 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2024-08-10 18:35:47,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=693420.0, ans=0.0 2024-08-10 18:36:06,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=693520.0, ans=0.125 2024-08-10 18:36:19,872 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-08-10 18:36:24,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=693720.0, ans=0.125 2024-08-10 18:36:24,905 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11400, loss[loss=0.1042, beats_loss=0.01185, ecapa_loss=0.0002021, whisper_loss=0.09032, over 14949.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01152, ecapa_loss=0.000236, whisper_loss=0.09613, over 3927909.35 frames. ], batch size: 57, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:36:49,814 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 18:37:00,934 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-08-10 18:37:03,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=693920.0, ans=15.0 2024-08-10 18:37:07,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.890e+01 3.279e+01 3.857e+01 6.641e+01, threshold=6.557e+01, percent-clipped=0.0 2024-08-10 18:37:07,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693920.0, ans=0.1 2024-08-10 18:37:22,289 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 18:37:26,462 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.32 vs. limit=5.0 2024-08-10 18:37:30,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=694120.0, ans=0.0 2024-08-10 18:37:39,650 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11450, loss[loss=0.09881, beats_loss=0.01279, ecapa_loss=0.0002171, whisper_loss=0.08385, over 17416.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01168, ecapa_loss=0.0002351, whisper_loss=0.09581, over 3954633.91 frames. ], batch size: 69, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:37:39,827 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-10 18:37:44,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=694220.0, ans=10.0 2024-08-10 18:37:44,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=694220.0, ans=0.125 2024-08-10 18:37:51,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=694220.0, ans=0.125 2024-08-10 18:37:54,016 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 18:38:19,372 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=22.5 2024-08-10 18:38:21,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=694420.0, ans=0.125 2024-08-10 18:38:28,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=694520.0, ans=0.125 2024-08-10 18:38:41,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=694620.0, ans=0.1 2024-08-10 18:38:47,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=694620.0, ans=0.2 2024-08-10 18:38:51,458 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 18:38:52,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=694620.0, ans=0.0 2024-08-10 18:38:57,192 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11500, loss[loss=0.1239, beats_loss=0.01083, ecapa_loss=0.0002107, whisper_loss=0.1109, over 22414.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01165, ecapa_loss=0.0002342, whisper_loss=0.09657, over 3966283.48 frames. ], batch size: 87, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:38:59,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=694720.0, ans=0.0 2024-08-10 18:39:01,463 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=15.0 2024-08-10 18:39:08,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694720.0, ans=0.1 2024-08-10 18:39:37,971 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 19 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 18:39:39,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=694920.0, ans=0.0 2024-08-10 18:39:40,015 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2024-08-10 18:39:40,557 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.741e+01 3.082e+01 3.618e+01 5.964e+01, threshold=6.164e+01, percent-clipped=0.0 2024-08-10 18:39:50,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=695020.0, ans=0.95 2024-08-10 18:40:13,279 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 40 from Vox, 26 fro AS 2024-08-10 18:40:14,797 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11550, loss[loss=0.09997, beats_loss=0.009266, ecapa_loss=0.0003927, whisper_loss=0.08677, over 20214.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01159, ecapa_loss=0.0002353, whisper_loss=0.09695, over 3969436.20 frames. ], batch size: 90, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:40:16,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=695220.0, ans=0.125 2024-08-10 18:40:21,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=695220.0, ans=0.1 2024-08-10 18:40:27,140 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-10 18:40:39,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=695320.0, ans=0.125 2024-08-10 18:41:14,069 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=12.0 2024-08-10 18:41:31,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=695620.0, ans=0.125 2024-08-10 18:41:33,437 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11600, loss[loss=0.1098, beats_loss=0.01064, ecapa_loss=0.000221, whisper_loss=0.09692, over 18295.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01154, ecapa_loss=0.0002352, whisper_loss=0.09731, over 3968750.55 frames. ], batch size: 72, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:41:57,323 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-10 18:41:59,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=695820.0, ans=0.125 2024-08-10 18:42:02,121 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-10 18:42:04,937 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 18:42:16,265 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.151e+01 2.928e+01 3.314e+01 3.952e+01 8.355e+01, threshold=6.627e+01, percent-clipped=1.0 2024-08-10 18:42:32,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=696120.0, ans=0.125 2024-08-10 18:42:43,180 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-10 18:42:49,823 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11650, loss[loss=0.09979, beats_loss=0.01305, ecapa_loss=0.0002214, whisper_loss=0.08452, over 18255.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.0116, ecapa_loss=0.0002335, whisper_loss=0.09726, over 3982338.46 frames. ], batch size: 75, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:42:50,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=696220.0, ans=0.125 2024-08-10 18:43:13,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2024-08-10 18:43:14,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=696320.0, ans=0.125 2024-08-10 18:43:25,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=696420.0, ans=0.0 2024-08-10 18:43:39,223 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 18:43:59,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=696720.0, ans=15.0 2024-08-10 18:43:59,708 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11700, loss[loss=0.0947, beats_loss=0.01555, ecapa_loss=0.0001754, whisper_loss=0.07739, over 21215.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01167, ecapa_loss=0.0002326, whisper_loss=0.09685, over 3986922.31 frames. ], batch size: 84, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:44:29,394 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 18:44:32,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=696920.0, ans=0.2 2024-08-10 18:44:39,286 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.923e+01 3.356e+01 3.959e+01 5.415e+01, threshold=6.712e+01, percent-clipped=0.0 2024-08-10 18:44:53,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=697020.0, ans=0.125 2024-08-10 18:45:02,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=697120.0, ans=0.0 2024-08-10 18:45:09,080 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-10 18:45:10,063 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11750, loss[loss=0.09115, beats_loss=0.01051, ecapa_loss=0.0003224, whisper_loss=0.07742, over 20053.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01174, ecapa_loss=0.0002324, whisper_loss=0.09657, over 3958765.47 frames. ], batch size: 90, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:45:13,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=697220.0, ans=0.2 2024-08-10 18:45:21,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=697220.0, ans=0.1 2024-08-10 18:45:23,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=697320.0, ans=0.2 2024-08-10 18:45:30,113 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 18:45:31,359 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 18:46:19,363 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11800, loss[loss=0.09521, beats_loss=0.01434, ecapa_loss=0.0001741, whisper_loss=0.07913, over 17034.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01176, ecapa_loss=0.0002323, whisper_loss=0.09611, over 3946223.83 frames. ], batch size: 64, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:46:30,788 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 18:46:32,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=697820.0, ans=0.125 2024-08-10 18:46:38,458 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 18:46:48,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=697920.0, ans=0.1 2024-08-10 18:46:58,859 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+01 3.039e+01 3.457e+01 3.903e+01 6.365e+01, threshold=6.915e+01, percent-clipped=0.0 2024-08-10 18:47:00,382 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-10 18:47:08,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=698020.0, ans=0.125 2024-08-10 18:47:08,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=698020.0, ans=0.125 2024-08-10 18:47:16,276 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 17 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 18:47:22,194 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 13 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 18:47:24,108 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2024-08-10 18:47:29,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=698220.0, ans=0.0 2024-08-10 18:47:30,758 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11850, loss[loss=0.1149, beats_loss=0.01171, ecapa_loss=0.0002703, whisper_loss=0.1005, over 22806.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01186, ecapa_loss=0.000232, whisper_loss=0.09503, over 3965623.42 frames. ], batch size: 92, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:47:46,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=698320.0, ans=0.07 2024-08-10 18:47:51,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=698320.0, ans=22.5 2024-08-10 18:47:52,341 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 18:47:54,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=698320.0, ans=0.125 2024-08-10 18:48:02,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=698420.0, ans=0.125 2024-08-10 18:48:15,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=698520.0, ans=0.125 2024-08-10 18:48:15,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=698520.0, ans=22.5 2024-08-10 18:48:16,793 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 18:48:20,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=698520.0, ans=0.125 2024-08-10 18:48:24,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=698620.0, ans=0.0 2024-08-10 18:48:28,499 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 18:48:35,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=698620.0, ans=0.125 2024-08-10 18:48:36,537 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 18:48:39,112 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11900, loss[loss=0.1099, beats_loss=0.01197, ecapa_loss=0.0002457, whisper_loss=0.09549, over 18353.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01182, ecapa_loss=0.0002334, whisper_loss=0.09511, over 3948916.29 frames. ], batch size: 75, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:49:17,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.855e+01 3.159e+01 3.498e+01 6.204e+01, threshold=6.318e+01, percent-clipped=0.0 2024-08-10 18:49:24,552 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 18:49:29,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=699020.0, ans=0.1 2024-08-10 18:49:35,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=699120.0, ans=10.0 2024-08-10 18:49:45,532 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 18:49:46,922 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 11950, loss[loss=0.08514, beats_loss=0.01264, ecapa_loss=0.0002721, whisper_loss=0.06978, over 14716.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01177, ecapa_loss=0.0002336, whisper_loss=0.09467, over 3897025.06 frames. ], batch size: 62, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:49:48,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=699220.0, ans=0.0 2024-08-10 18:49:50,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=699220.0, ans=0.0 2024-08-10 18:49:57,889 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 18:50:00,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=699320.0, ans=10.0 2024-08-10 18:50:03,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=699320.0, ans=0.125 2024-08-10 18:50:13,528 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 18:50:24,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=699420.0, ans=0.1 2024-08-10 18:50:33,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=699520.0, ans=0.2 2024-08-10 18:50:33,314 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2024-08-10 18:50:34,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=15.0 2024-08-10 18:50:38,067 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 18:50:40,685 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 18:50:41,954 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-10 18:50:43,263 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-10 18:50:45,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=699620.0, ans=0.125 2024-08-10 18:50:47,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=699620.0, ans=0.125 2024-08-10 18:50:53,566 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12000, loss[loss=0.09886, beats_loss=0.01261, ecapa_loss=0.0002186, whisper_loss=0.08407, over 20843.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.0119, ecapa_loss=0.0002312, whisper_loss=0.09416, over 3888230.11 frames. ], batch size: 82, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:50:53,566 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 18:51:35,550 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on ASR_libri: loss=0.2622, beats_loss=0, ecapa_loss=0.0007279, whisper_loss=0.255, over 922467.00 frames. 2024-08-10 18:51:54,125 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on SV_voxceleb1: loss=0.006203, beats_loss=0, ecapa_loss=0.0006203, whisper_loss=0, over 939242.00 frames. 2024-08-10 18:53:47,250 INFO [train_multi_KD3.py:1149] (1/4) Epoch 5, validation on AT_audioset: loss=0.02662, beats_loss=0.02662, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 18:53:47,254 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 18:53:50,120 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 18:54:18,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=699920.0, ans=0.2 2024-08-10 18:54:21,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=699920.0, ans=0.05 2024-08-10 18:54:25,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.732e+01 3.134e+01 3.531e+01 7.163e+01, threshold=6.268e+01, percent-clipped=1.0 2024-08-10 18:54:30,934 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.863e-01 2024-08-10 18:54:32,621 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.48 vs. limit=15.0 2024-08-10 18:54:40,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=700120.0, ans=0.0 2024-08-10 18:54:40,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=700120.0, ans=0.09899494936611666 2024-08-10 18:54:41,937 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.333e+05 2024-08-10 18:54:48,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=700120.0, ans=0.0 2024-08-10 18:54:54,930 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12050, loss[loss=0.1188, beats_loss=0.009576, ecapa_loss=0.0002279, whisper_loss=0.1069, over 16176.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01188, ecapa_loss=0.0002303, whisper_loss=0.09406, over 3870088.49 frames. ], batch size: 62, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:55:00,363 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 18:55:00,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=700220.0, ans=0.125 2024-08-10 18:55:05,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=700220.0, ans=15.0 2024-08-10 18:55:07,125 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 18:55:07,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=700320.0, ans=0.1 2024-08-10 18:55:09,626 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 18:55:30,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=700420.0, ans=0.125 2024-08-10 18:55:41,175 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.17 vs. limit=15.0 2024-08-10 18:55:42,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=700520.0, ans=0.2 2024-08-10 18:55:43,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=700520.0, ans=0.125 2024-08-10 18:55:45,590 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 18:55:54,504 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.58 vs. limit=10.0 2024-08-10 18:56:02,372 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12100, loss[loss=0.09961, beats_loss=0.0138, ecapa_loss=0.0002185, whisper_loss=0.08363, over 16262.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01178, ecapa_loss=0.0002321, whisper_loss=0.09437, over 3871514.26 frames. ], batch size: 67, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:56:14,291 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 18:56:14,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=700820.0, ans=0.125 2024-08-10 18:56:33,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=700920.0, ans=0.0 2024-08-10 18:56:40,429 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.774e+01 3.188e+01 3.789e+01 5.825e+01, threshold=6.376e+01, percent-clipped=0.0 2024-08-10 18:56:52,388 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-10 18:57:09,759 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12150, loss[loss=0.1042, beats_loss=0.01011, ecapa_loss=0.000317, whisper_loss=0.09092, over 16039.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01183, ecapa_loss=0.0002319, whisper_loss=0.09463, over 3900084.90 frames. ], batch size: 71, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:57:25,895 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.97 vs. limit=22.5 2024-08-10 18:57:42,961 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 18:57:43,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=701420.0, ans=0.125 2024-08-10 18:57:50,867 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 18:58:06,403 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-10 18:58:09,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=701620.0, ans=0.125 2024-08-10 18:58:11,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=701620.0, ans=0.125 2024-08-10 18:58:15,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=701620.0, ans=0.0 2024-08-10 18:58:17,821 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12200, loss[loss=0.09805, beats_loss=0.01327, ecapa_loss=0.0001998, whisper_loss=0.08278, over 21456.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01187, ecapa_loss=0.000232, whisper_loss=0.09469, over 3917996.32 frames. ], batch size: 84, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:58:24,152 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.93 vs. limit=12.0 2024-08-10 18:58:44,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=701920.0, ans=0.125 2024-08-10 18:58:44,843 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-08-10 18:58:55,836 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.903e+01 3.177e+01 3.659e+01 7.236e+01, threshold=6.353e+01, percent-clipped=1.0 2024-08-10 18:58:58,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=12.0 2024-08-10 18:59:06,101 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2024-08-10 18:59:09,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=702020.0, ans=0.2 2024-08-10 18:59:10,949 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 27 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-10 18:59:12,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=702120.0, ans=0.035 2024-08-10 18:59:25,045 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12250, loss[loss=0.09585, beats_loss=0.01233, ecapa_loss=0.0002013, whisper_loss=0.08151, over 22107.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01181, ecapa_loss=0.0002334, whisper_loss=0.09443, over 3897316.02 frames. ], batch size: 90, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:59:41,268 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 18:59:43,731 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 16 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 18:59:57,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=702420.0, ans=0.0 2024-08-10 19:00:10,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=702520.0, ans=0.125 2024-08-10 19:00:23,623 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 19:00:30,207 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-10 19:00:32,586 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12300, loss[loss=0.104, beats_loss=0.01389, ecapa_loss=0.0001999, whisper_loss=0.08806, over 22565.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01179, ecapa_loss=0.0002338, whisper_loss=0.09479, over 3888587.27 frames. ], batch size: 90, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:00:43,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702720.0, ans=0.1 2024-08-10 19:00:53,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.13 vs. limit=15.0 2024-08-10 19:00:58,064 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-10 19:01:09,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=702920.0, ans=0.2 2024-08-10 19:01:10,058 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.851e+01 3.322e+01 3.771e+01 6.110e+01, threshold=6.644e+01, percent-clipped=0.0 2024-08-10 19:01:17,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=703020.0, ans=0.125 2024-08-10 19:01:17,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=703020.0, ans=0.125 2024-08-10 19:01:20,083 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=15.0 2024-08-10 19:01:30,043 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2024-08-10 19:01:30,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=703120.0, ans=0.0 2024-08-10 19:01:39,861 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12350, loss[loss=0.1207, beats_loss=0.009255, ecapa_loss=0.0003204, whisper_loss=0.1082, over 16687.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.0118, ecapa_loss=0.0002359, whisper_loss=0.09464, over 3884759.83 frames. ], batch size: 72, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:01:40,347 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.602e-01 2024-08-10 19:02:11,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=703420.0, ans=0.125 2024-08-10 19:02:11,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2024-08-10 19:02:12,918 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.013e-01 2024-08-10 19:02:35,495 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 19:02:46,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=703620.0, ans=0.125 2024-08-10 19:02:52,895 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12400, loss[loss=0.1182, beats_loss=0.008742, ecapa_loss=0.0002354, whisper_loss=0.1071, over 23234.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01176, ecapa_loss=0.0002343, whisper_loss=0.0941, over 3893995.47 frames. ], batch size: 89, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:02:59,613 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 19:03:00,782 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 19:03:02,009 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 19:03:21,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=703920.0, ans=0.125 2024-08-10 19:03:31,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.592e+01 3.077e+01 3.649e+01 6.276e+01, threshold=6.154e+01, percent-clipped=0.0 2024-08-10 19:03:32,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=703920.0, ans=0.125 2024-08-10 19:03:57,360 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 19:04:03,108 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12450, loss[loss=0.08134, beats_loss=0.01052, ecapa_loss=0.00025, whisper_loss=0.06832, over 13651.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01173, ecapa_loss=0.0002339, whisper_loss=0.0939, over 3904701.54 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:04:21,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=704320.0, ans=15.0 2024-08-10 19:04:28,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=704420.0, ans=0.125 2024-08-10 19:04:37,750 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.08 vs. limit=15.0 2024-08-10 19:04:45,752 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.219e-01 2024-08-10 19:04:51,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=704520.0, ans=0.0 2024-08-10 19:04:59,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=704620.0, ans=0.125 2024-08-10 19:05:07,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.86 vs. limit=22.5 2024-08-10 19:05:12,230 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12500, loss[loss=0.1039, beats_loss=0.01093, ecapa_loss=0.0002158, whisper_loss=0.09084, over 23223.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01177, ecapa_loss=0.0002317, whisper_loss=0.09392, over 3933995.83 frames. ], batch size: 92, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:05:16,880 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 19:05:39,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=704920.0, ans=0.0 2024-08-10 19:05:40,117 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-08-10 19:05:51,409 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+01 2.864e+01 3.204e+01 3.870e+01 6.784e+01, threshold=6.407e+01, percent-clipped=3.0 2024-08-10 19:05:56,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=705020.0, ans=0.0 2024-08-10 19:05:57,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=705020.0, ans=0.2 2024-08-10 19:06:00,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705020.0, ans=0.1 2024-08-10 19:06:05,385 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 19:06:08,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=705120.0, ans=0.125 2024-08-10 19:06:09,163 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2024-08-10 19:06:20,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=705220.0, ans=0.125 2024-08-10 19:06:21,521 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12550, loss[loss=0.123, beats_loss=0.01442, ecapa_loss=0.0001759, whisper_loss=0.1069, over 22595.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01185, ecapa_loss=0.0002335, whisper_loss=0.09437, over 3956990.68 frames. ], batch size: 88, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:06:40,626 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 25 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 19:06:43,892 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 29 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-10 19:06:47,905 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-10 19:06:48,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.41 vs. limit=15.0 2024-08-10 19:06:50,687 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 19:06:58,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.36 vs. limit=22.5 2024-08-10 19:07:13,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=705520.0, ans=0.2 2024-08-10 19:07:32,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=705620.0, ans=0.0 2024-08-10 19:07:34,881 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12600, loss[loss=0.1026, beats_loss=0.01323, ecapa_loss=0.0002281, whisper_loss=0.08713, over 14374.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01181, ecapa_loss=0.0002353, whisper_loss=0.09453, over 3949839.48 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:07:53,116 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=15.0 2024-08-10 19:07:58,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=705820.0, ans=0.0 2024-08-10 19:08:12,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.787e+01 3.074e+01 3.484e+01 6.689e+01, threshold=6.148e+01, percent-clipped=1.0 2024-08-10 19:08:26,780 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.19 vs. limit=15.0 2024-08-10 19:08:42,080 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12650, loss[loss=0.1254, beats_loss=0.007832, ecapa_loss=0.0003352, whisper_loss=0.1142, over 20978.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01177, ecapa_loss=0.0002352, whisper_loss=0.0946, over 3934602.25 frames. ], batch size: 90, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:09:16,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.01 vs. limit=10.0 2024-08-10 19:09:18,790 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 19:09:31,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=706520.0, ans=0.2 2024-08-10 19:09:31,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706520.0, ans=0.1 2024-08-10 19:09:44,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=706620.0, ans=0.125 2024-08-10 19:09:49,107 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12700, loss[loss=0.1171, beats_loss=0.01059, ecapa_loss=0.0002729, whisper_loss=0.1037, over 22637.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01177, ecapa_loss=0.0002364, whisper_loss=0.09488, over 3913398.25 frames. ], batch size: 94, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:09:53,299 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-10 19:10:01,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=706820.0, ans=0.125 2024-08-10 19:10:04,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=706820.0, ans=0.125 2024-08-10 19:10:13,959 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 19:10:18,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=706920.0, ans=0.09899494936611666 2024-08-10 19:10:27,294 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.805e+01 3.118e+01 3.753e+01 7.808e+01, threshold=6.236e+01, percent-clipped=1.0 2024-08-10 19:10:30,008 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-10 19:10:34,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=707020.0, ans=0.1 2024-08-10 19:10:47,349 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 19:10:57,244 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12750, loss[loss=0.1106, beats_loss=0.01272, ecapa_loss=0.0002314, whisper_loss=0.09559, over 20666.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01178, ecapa_loss=0.0002365, whisper_loss=0.09524, over 3928869.85 frames. ], batch size: 84, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:11:03,206 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 17 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 19:11:29,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=707420.0, ans=0.2 2024-08-10 19:11:36,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=707520.0, ans=15.0 2024-08-10 19:11:37,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=707520.0, ans=0.125 2024-08-10 19:11:56,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=707620.0, ans=0.125 2024-08-10 19:12:04,636 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12800, loss[loss=0.09943, beats_loss=0.01229, ecapa_loss=0.0002696, whisper_loss=0.08445, over 14292.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01181, ecapa_loss=0.0002381, whisper_loss=0.0944, over 3894776.58 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:12:04,868 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 19:12:09,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=707720.0, ans=0.1 2024-08-10 19:12:10,380 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 19:12:15,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=707720.0, ans=0.125 2024-08-10 19:12:26,377 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-10 19:12:29,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=707820.0, ans=0.0 2024-08-10 19:12:39,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=707920.0, ans=0.2 2024-08-10 19:12:41,918 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.755e+01 3.116e+01 3.558e+01 5.514e+01, threshold=6.233e+01, percent-clipped=0.0 2024-08-10 19:12:59,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=708120.0, ans=22.5 2024-08-10 19:13:02,481 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2024-08-10 19:13:11,194 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12850, loss[loss=0.1142, beats_loss=0.01233, ecapa_loss=0.000245, whisper_loss=0.09938, over 20050.00 frames. ], tot_loss[loss=0.108, beats_loss=0.0119, ecapa_loss=0.0002377, whisper_loss=0.09372, over 3890936.99 frames. ], batch size: 82, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:13:23,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=708220.0, ans=22.5 2024-08-10 19:13:24,613 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.563e-03 2024-08-10 19:13:32,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=708320.0, ans=0.125 2024-08-10 19:13:34,404 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2024-08-10 19:13:37,941 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-10 19:13:52,693 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 19:13:59,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=708520.0, ans=0.0 2024-08-10 19:14:08,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=708620.0, ans=0.0 2024-08-10 19:14:09,702 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 19:14:09,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=708620.0, ans=0.125 2024-08-10 19:14:17,576 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 19:14:17,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=708720.0, ans=0.035 2024-08-10 19:14:18,666 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12900, loss[loss=0.1111, beats_loss=0.01224, ecapa_loss=0.0002279, whisper_loss=0.09662, over 22259.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01183, ecapa_loss=0.0002388, whisper_loss=0.09345, over 3836551.66 frames. ], batch size: 89, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:14:24,561 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 19:14:24,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=708720.0, ans=0.0 2024-08-10 19:14:24,757 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.142e-02 2024-08-10 19:14:43,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=708820.0, ans=0.125 2024-08-10 19:14:48,615 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.61 vs. limit=22.5 2024-08-10 19:14:52,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=708920.0, ans=0.0 2024-08-10 19:14:55,258 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.870e+01 3.277e+01 3.550e+01 6.009e+01, threshold=6.554e+01, percent-clipped=0.0 2024-08-10 19:15:01,851 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 19:15:02,627 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2024-08-10 19:15:03,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=709020.0, ans=0.0 2024-08-10 19:15:14,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=709120.0, ans=0.0 2024-08-10 19:15:24,333 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 12950, loss[loss=0.09048, beats_loss=0.01217, ecapa_loss=0.0002237, whisper_loss=0.07607, over 20030.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01181, ecapa_loss=0.0002393, whisper_loss=0.09356, over 3841119.56 frames. ], batch size: 81, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:15:47,218 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-08-10 19:15:52,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=709420.0, ans=0.2 2024-08-10 19:16:06,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=709520.0, ans=0.2 2024-08-10 19:16:17,898 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 19:16:21,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=709620.0, ans=0.125 2024-08-10 19:16:27,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=709620.0, ans=0.2 2024-08-10 19:16:30,101 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13000, loss[loss=0.09586, beats_loss=0.01433, ecapa_loss=0.000227, whisper_loss=0.07927, over 19850.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01183, ecapa_loss=0.000237, whisper_loss=0.09354, over 3847751.61 frames. ], batch size: 78, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:16:31,568 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 19:16:34,081 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 19:16:48,597 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 19 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-10 19:16:55,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=709920.0, ans=0.125 2024-08-10 19:16:56,198 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-10 19:16:56,640 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.36 vs. limit=22.5 2024-08-10 19:17:06,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.263e+01 2.942e+01 3.329e+01 3.753e+01 5.609e+01, threshold=6.657e+01, percent-clipped=0.0 2024-08-10 19:17:08,371 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 31 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 19:17:12,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=710020.0, ans=0.125 2024-08-10 19:17:20,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=710020.0, ans=0.125 2024-08-10 19:17:24,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=710120.0, ans=0.125 2024-08-10 19:17:35,813 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13050, loss[loss=0.1215, beats_loss=0.008211, ecapa_loss=0.0002832, whisper_loss=0.1104, over 15324.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01176, ecapa_loss=0.0002372, whisper_loss=0.09391, over 3881706.82 frames. ], batch size: 59, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:17:36,013 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 47 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 19:17:45,077 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 31 from Vox, 25 fro AS 2024-08-10 19:17:48,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=710320.0, ans=0.5 2024-08-10 19:18:04,483 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2024-08-10 19:18:11,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=710420.0, ans=0.125 2024-08-10 19:18:15,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=710520.0, ans=0.125 2024-08-10 19:18:17,082 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 19:18:26,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=710520.0, ans=0.0 2024-08-10 19:18:31,942 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 19:18:42,110 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13100, loss[loss=0.104, beats_loss=0.01316, ecapa_loss=0.0001869, whisper_loss=0.08894, over 22331.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01172, ecapa_loss=0.0002363, whisper_loss=0.09439, over 3880359.32 frames. ], batch size: 89, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:18:49,202 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 22 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 19:18:54,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=710820.0, ans=0.0 2024-08-10 19:18:57,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=710820.0, ans=0.0 2024-08-10 19:19:14,546 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 19:19:16,436 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-10 19:19:19,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+01 2.880e+01 3.300e+01 3.880e+01 5.965e+01, threshold=6.600e+01, percent-clipped=0.0 2024-08-10 19:19:26,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=711020.0, ans=0.125 2024-08-10 19:19:39,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=711120.0, ans=0.95 2024-08-10 19:19:43,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=711120.0, ans=0.0 2024-08-10 19:19:48,772 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13150, loss[loss=0.09817, beats_loss=0.0127, ecapa_loss=0.000243, whisper_loss=0.08305, over 15079.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.0118, ecapa_loss=0.0002352, whisper_loss=0.09423, over 3846353.99 frames. ], batch size: 60, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:19:53,510 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.977e-01 2024-08-10 19:20:05,343 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 19:20:11,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=711320.0, ans=0.0 2024-08-10 19:20:22,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=711420.0, ans=0.125 2024-08-10 19:20:24,832 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 19:20:30,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=711520.0, ans=0.0 2024-08-10 19:20:45,445 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 19:20:53,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=711620.0, ans=0.1 2024-08-10 19:20:59,011 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13200, loss[loss=0.1057, beats_loss=0.01255, ecapa_loss=0.0001961, whisper_loss=0.09119, over 18064.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01178, ecapa_loss=0.0002339, whisper_loss=0.09389, over 3803593.88 frames. ], batch size: 69, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:21:09,185 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-08-10 19:21:16,936 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 19:21:23,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=711820.0, ans=0.1 2024-08-10 19:21:27,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=711920.0, ans=0.0 2024-08-10 19:21:28,456 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 19:21:37,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 3.006e+01 3.463e+01 3.966e+01 7.207e+01, threshold=6.927e+01, percent-clipped=1.0 2024-08-10 19:21:45,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=712020.0, ans=0.2 2024-08-10 19:21:47,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=712020.0, ans=0.125 2024-08-10 19:21:50,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=712020.0, ans=0.0 2024-08-10 19:22:01,704 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-10 19:22:05,839 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13250, loss[loss=0.06379, beats_loss=0.01506, ecapa_loss=0.0002628, whisper_loss=0.0461, over 13225.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01176, ecapa_loss=0.0002354, whisper_loss=0.09417, over 3810279.66 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:22:11,673 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 19:22:28,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=712320.0, ans=0.125 2024-08-10 19:22:33,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=712420.0, ans=0.125 2024-08-10 19:22:48,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=712520.0, ans=0.125 2024-08-10 19:22:52,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=712520.0, ans=0.2 2024-08-10 19:22:54,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=712520.0, ans=0.125 2024-08-10 19:23:00,356 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 10 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-10 19:23:11,660 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13300, loss[loss=0.1118, beats_loss=0.01102, ecapa_loss=0.0002188, whisper_loss=0.09855, over 22034.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01178, ecapa_loss=0.0002352, whisper_loss=0.09367, over 3815815.00 frames. ], batch size: 87, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:23:17,554 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-10 19:23:17,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=712720.0, ans=0.0 2024-08-10 19:23:19,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.94 vs. limit=15.0 2024-08-10 19:23:21,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=712720.0, ans=0.0 2024-08-10 19:23:32,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=712820.0, ans=0.2 2024-08-10 19:23:33,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=712820.0, ans=0.125 2024-08-10 19:23:45,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=712920.0, ans=0.0 2024-08-10 19:23:50,924 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.791e+01 3.143e+01 3.422e+01 5.648e+01, threshold=6.287e+01, percent-clipped=0.0 2024-08-10 19:23:55,158 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-10 19:23:55,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=713020.0, ans=0.125 2024-08-10 19:24:12,391 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 19:24:12,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=713120.0, ans=0.125 2024-08-10 19:24:13,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=713120.0, ans=0.2 2024-08-10 19:24:15,478 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2024-08-10 19:24:16,390 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 19:24:18,154 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2024-08-10 19:24:19,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=713220.0, ans=0.0 2024-08-10 19:24:19,926 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13350, loss[loss=0.1084, beats_loss=0.01087, ecapa_loss=0.0002277, whisper_loss=0.09524, over 18312.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01177, ecapa_loss=0.0002349, whisper_loss=0.09405, over 3852753.22 frames. ], batch size: 70, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:24:20,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=713220.0, ans=0.0 2024-08-10 19:24:32,280 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 25 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-10 19:24:43,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=713320.0, ans=0.125 2024-08-10 19:25:03,121 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.61 vs. limit=15.0 2024-08-10 19:25:09,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=713520.0, ans=0.1 2024-08-10 19:25:26,911 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13400, loss[loss=0.1119, beats_loss=0.01171, ecapa_loss=0.0002741, whisper_loss=0.09748, over 16845.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01167, ecapa_loss=0.0002352, whisper_loss=0.09469, over 3826668.34 frames. ], batch size: 69, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:25:28,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=713720.0, ans=0.125 2024-08-10 19:25:34,973 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 19:25:46,598 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.266e-01 2024-08-10 19:25:53,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=713920.0, ans=0.125 2024-08-10 19:25:59,060 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 19:26:05,692 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.716e+01 3.114e+01 3.677e+01 5.856e+01, threshold=6.229e+01, percent-clipped=0.0 2024-08-10 19:26:38,536 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13450, loss[loss=0.08662, beats_loss=0.01447, ecapa_loss=0.0002565, whisper_loss=0.06958, over 18002.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01168, ecapa_loss=0.0002352, whisper_loss=0.09499, over 3848109.97 frames. ], batch size: 75, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:26:40,909 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-08-10 19:26:46,450 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2024-08-10 19:26:50,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=714220.0, ans=0.2 2024-08-10 19:27:06,532 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-10 19:27:11,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.97 vs. limit=6.0 2024-08-10 19:27:44,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=714520.0, ans=0.0 2024-08-10 19:28:07,749 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 34 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 19:28:18,076 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13500, loss[loss=0.1116, beats_loss=0.01034, ecapa_loss=0.0002625, whisper_loss=0.09859, over 23410.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01166, ecapa_loss=0.0002364, whisper_loss=0.09505, over 3856423.88 frames. ], batch size: 92, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:28:40,770 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 19:28:44,926 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.09 vs. limit=6.0 2024-08-10 19:29:05,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714920.0, ans=0.1 2024-08-10 19:29:11,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.843e+01 3.302e+01 3.860e+01 1.367e+02, threshold=6.604e+01, percent-clipped=1.0 2024-08-10 19:29:43,682 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13550, loss[loss=0.1049, beats_loss=0.0111, ecapa_loss=0.000172, whisper_loss=0.09209, over 19811.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01179, ecapa_loss=0.0002329, whisper_loss=0.09449, over 3898910.15 frames. ], batch size: 76, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:29:47,373 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=12.0 2024-08-10 19:29:53,372 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 19:30:00,828 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-10 19:30:10,363 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 19:30:14,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=715420.0, ans=0.05 2024-08-10 19:30:56,088 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13600, loss[loss=0.132, beats_loss=0.009736, ecapa_loss=0.0002278, whisper_loss=0.12, over 21898.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01186, ecapa_loss=0.0002309, whisper_loss=0.09468, over 3911406.52 frames. ], batch size: 84, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:31:21,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=715820.0, ans=0.125 2024-08-10 19:31:25,870 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-10 19:31:40,548 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 3.018e+01 3.345e+01 4.176e+01 9.829e+01, threshold=6.690e+01, percent-clipped=2.0 2024-08-10 19:31:55,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=716020.0, ans=0.125 2024-08-10 19:32:13,157 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13650, loss[loss=0.09293, beats_loss=0.01093, ecapa_loss=0.0002314, whisper_loss=0.07969, over 21063.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01191, ecapa_loss=0.0002319, whisper_loss=0.09486, over 3942990.34 frames. ], batch size: 86, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:32:38,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=716320.0, ans=0.0 2024-08-10 19:32:46,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=716420.0, ans=0.0 2024-08-10 19:32:54,818 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 19:32:55,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=716420.0, ans=0.1 2024-08-10 19:32:56,766 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2024-08-10 19:33:01,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=716520.0, ans=0.125 2024-08-10 19:33:06,046 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 19:33:29,302 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13700, loss[loss=0.1206, beats_loss=0.01399, ecapa_loss=0.000235, whisper_loss=0.1042, over 22618.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.0119, ecapa_loss=0.0002311, whisper_loss=0.09522, over 3949776.33 frames. ], batch size: 94, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:33:45,504 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 34 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 19:33:48,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=716820.0, ans=0.0 2024-08-10 19:33:53,889 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 19:34:12,863 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 2.823e+01 3.317e+01 3.890e+01 6.067e+01, threshold=6.634e+01, percent-clipped=0.0 2024-08-10 19:34:13,506 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2024-08-10 19:34:42,071 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=22.5 2024-08-10 19:34:44,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=717120.0, ans=0.04949747468305833 2024-08-10 19:34:46,782 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13750, loss[loss=0.0967, beats_loss=0.01275, ecapa_loss=0.0002108, whisper_loss=0.08184, over 20861.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01185, ecapa_loss=0.0002314, whisper_loss=0.0946, over 3910590.86 frames. ], batch size: 84, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:35:51,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=717620.0, ans=0.0 2024-08-10 19:36:01,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=717720.0, ans=0.125 2024-08-10 19:36:02,430 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13800, loss[loss=0.1438, beats_loss=0.007375, ecapa_loss=0.000241, whisper_loss=0.134, over 15338.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.0118, ecapa_loss=0.0002297, whisper_loss=0.09535, over 3920396.05 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:36:34,487 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 19:36:42,655 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 19:36:46,388 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.322e+01 2.754e+01 3.224e+01 3.629e+01 6.153e+01, threshold=6.448e+01, percent-clipped=0.0 2024-08-10 19:36:48,111 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 19:36:55,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=718020.0, ans=0.1 2024-08-10 19:37:10,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=718120.0, ans=0.125 2024-08-10 19:37:21,053 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13850, loss[loss=0.1227, beats_loss=0.01068, ecapa_loss=0.0002344, whisper_loss=0.1097, over 21656.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01171, ecapa_loss=0.0002305, whisper_loss=0.09529, over 3908719.96 frames. ], batch size: 88, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:37:37,115 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.97 vs. limit=10.0 2024-08-10 19:38:07,274 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.978e-01 2024-08-10 19:38:40,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=718720.0, ans=0.125 2024-08-10 19:38:41,125 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13900, loss[loss=0.1121, beats_loss=0.009924, ecapa_loss=0.0002612, whisper_loss=0.09957, over 13683.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.0118, ecapa_loss=0.0002307, whisper_loss=0.09533, over 3889766.74 frames. ], batch size: 55, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:38:41,562 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-10 19:38:55,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=718820.0, ans=0.125 2024-08-10 19:39:01,934 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-10 19:39:11,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=718920.0, ans=0.2 2024-08-10 19:39:14,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=718920.0, ans=0.1 2024-08-10 19:39:16,385 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.72 vs. limit=15.0 2024-08-10 19:39:24,663 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.959e+01 3.276e+01 3.717e+01 7.288e+01, threshold=6.551e+01, percent-clipped=2.0 2024-08-10 19:39:32,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=719020.0, ans=0.125 2024-08-10 19:39:42,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=719120.0, ans=0.0 2024-08-10 19:39:42,987 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-10 19:39:43,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=719120.0, ans=0.2 2024-08-10 19:39:46,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=719120.0, ans=0.125 2024-08-10 19:39:57,701 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 13950, loss[loss=0.1001, beats_loss=0.01133, ecapa_loss=0.0002782, whisper_loss=0.08595, over 16992.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01177, ecapa_loss=0.0002303, whisper_loss=0.09564, over 3900737.25 frames. ], batch size: 68, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:40:02,802 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-10 19:40:32,735 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.81 vs. limit=22.5 2024-08-10 19:41:13,741 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 14000, loss[loss=0.1085, beats_loss=0.01283, ecapa_loss=0.0001693, whisper_loss=0.09395, over 21191.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01181, ecapa_loss=0.0002297, whisper_loss=0.09534, over 3893988.28 frames. ], batch size: 82, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:41:18,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=719720.0, ans=0.0 2024-08-10 19:41:22,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=719720.0, ans=0.2 2024-08-10 19:41:41,622 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.03 vs. limit=15.0 2024-08-10 19:41:53,236 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 17 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 19:41:59,736 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.816e+01 3.391e+01 3.815e+01 6.287e+01, threshold=6.783e+01, percent-clipped=0.0 2024-08-10 19:42:34,380 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 14050, loss[loss=0.1363, beats_loss=0.009386, ecapa_loss=0.0002216, whisper_loss=0.1247, over 23800.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01176, ecapa_loss=0.0002299, whisper_loss=0.09628, over 3910987.26 frames. ], batch size: 89, lr: 1.18e-02, grad_scale: 2199023255552.0 2024-08-10 19:42:45,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720220.0, ans=0.1 2024-08-10 19:43:01,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=720320.0, ans=0.125 2024-08-10 19:43:14,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=720420.0, ans=0.2 2024-08-10 19:43:42,602 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2024-08-10 19:43:44,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=720620.0, ans=0.2 2024-08-10 19:43:45,357 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.26 vs. limit=15.0 2024-08-10 19:43:51,621 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 14100, loss[loss=0.1118, beats_loss=0.01276, ecapa_loss=0.0002143, whisper_loss=0.09693, over 22111.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01176, ecapa_loss=0.0002308, whisper_loss=0.09591, over 3899262.06 frames. ], batch size: 90, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:44:03,624 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.149e-02 2024-08-10 19:44:06,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=720820.0, ans=0.125 2024-08-10 19:44:11,045 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 19:44:28,522 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-10 19:44:28,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=720920.0, ans=0.125 2024-08-10 19:44:32,634 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.752e+01 3.141e+01 3.762e+01 7.016e+01, threshold=6.282e+01, percent-clipped=2.0 2024-08-10 19:44:35,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=721020.0, ans=0.0 2024-08-10 19:44:41,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=721020.0, ans=0.0 2024-08-10 19:44:55,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=721120.0, ans=0.2 2024-08-10 19:44:59,319 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 19:45:06,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 14150, loss[loss=0.1085, beats_loss=0.01201, ecapa_loss=0.0002345, whisper_loss=0.09411, over 22250.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01173, ecapa_loss=0.0002311, whisper_loss=0.09544, over 3879749.23 frames. ], batch size: 88, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:45:09,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=721220.0, ans=0.0 2024-08-10 19:45:11,214 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 19:45:13,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=721220.0, ans=0.125 2024-08-10 19:45:14,502 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 19:45:38,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=721420.0, ans=0.5 2024-08-10 19:46:19,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=721620.0, ans=0.5 2024-08-10 19:46:21,282 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 14200, loss[loss=0.1146, beats_loss=0.009923, ecapa_loss=0.0002917, whisper_loss=0.1018, over 21853.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01176, ecapa_loss=0.0002297, whisper_loss=0.09487, over 3910070.95 frames. ], batch size: 88, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:46:30,044 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.88 vs. limit=15.0 2024-08-10 19:46:51,025 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 19:47:04,057 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.464e+01 2.830e+01 3.191e+01 3.752e+01 5.497e+01, threshold=6.381e+01, percent-clipped=0.0 2024-08-10 19:47:18,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=722020.0, ans=0.0 2024-08-10 19:47:18,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=722020.0, ans=0.05 2024-08-10 19:47:20,919 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 19:47:24,366 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 19:47:27,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=722120.0, ans=0.125 2024-08-10 19:47:38,110 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 14250, loss[loss=0.1017, beats_loss=0.01166, ecapa_loss=0.0002607, whisper_loss=0.08743, over 16311.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01176, ecapa_loss=0.000229, whisper_loss=0.09544, over 3926923.16 frames. ], batch size: 69, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:47:44,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=722220.0, ans=0.2 2024-08-10 19:47:45,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=722220.0, ans=0.125 2024-08-10 19:47:58,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=722320.0, ans=0.0 2024-08-10 19:48:00,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722320.0, ans=0.1 2024-08-10 19:48:04,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=722320.0, ans=0.0 2024-08-10 19:48:14,645 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2024-08-10 19:48:34,823 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.30 vs. limit=15.0 2024-08-10 19:48:36,318 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.24 vs. limit=15.0 2024-08-10 19:48:38,432 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2024-08-10 19:48:47,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=722620.0, ans=0.125 2024-08-10 19:48:56,921 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 14300, loss[loss=0.111, beats_loss=0.01198, ecapa_loss=0.0002136, whisper_loss=0.09687, over 18122.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01178, ecapa_loss=0.0002285, whisper_loss=0.09545, over 3922623.81 frames. ], batch size: 70, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:48:57,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=722720.0, ans=0.09899494936611666 2024-08-10 19:48:59,630 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-10 19:49:26,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=722920.0, ans=0.0 2024-08-10 19:49:40,843 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.839e+01 3.149e+01 3.823e+01 7.710e+01, threshold=6.298e+01, percent-clipped=1.0 2024-08-10 19:49:49,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=723020.0, ans=0.125 2024-08-10 19:50:15,320 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 14350, loss[loss=0.1179, beats_loss=0.008854, ecapa_loss=0.0002595, whisper_loss=0.1064, over 14455.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01182, ecapa_loss=0.0002284, whisper_loss=0.09449, over 3905942.47 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:50:15,521 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 15 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 19:50:21,544 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 23 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-10 19:50:28,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723220.0, ans=0.1 2024-08-10 19:50:51,051 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 19:50:52,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=723420.0, ans=0.0 2024-08-10 19:50:55,656 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-10 19:51:26,186 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2024-08-10 19:51:29,701 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=12.0 2024-08-10 19:51:30,392 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 14400, loss[loss=0.1062, beats_loss=0.01249, ecapa_loss=0.0002439, whisper_loss=0.09128, over 21856.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01176, ecapa_loss=0.0002301, whisper_loss=0.09476, over 3892764.77 frames. ], batch size: 91, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:51:48,008 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 19:52:00,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=723920.0, ans=0.0 2024-08-10 19:52:09,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=723920.0, ans=0.1 2024-08-10 19:52:11,390 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.768e+01 3.038e+01 3.446e+01 5.868e+01, threshold=6.077e+01, percent-clipped=0.0 2024-08-10 19:52:14,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=724020.0, ans=0.1 2024-08-10 19:52:18,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=724020.0, ans=0.0 2024-08-10 19:52:41,271 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.31 vs. limit=6.0 2024-08-10 19:52:47,398 INFO [train_multi_KD3.py:1116] (1/4) Epoch 5, batch 14450, loss[loss=0.09435, beats_loss=0.01431, ecapa_loss=0.0002235, whisper_loss=0.0778, over 22304.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01185, ecapa_loss=0.0002321, whisper_loss=0.09413, over 3897487.40 frames. ], batch size: 92, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:53:07,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=724320.0, ans=0.2 2024-08-10 19:53:07,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=724320.0, ans=0.125 2024-08-10 19:53:13,455 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 19:53:17,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=724320.0, ans=0.125 2024-08-10 19:53:18,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=724420.0, ans=0.125 2024-08-10 19:53:28,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=724420.0, ans=0.0 2024-08-10 19:53:32,672 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-10 19:53:37,604 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-10 19:53:44,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=724520.0, ans=0.0 2024-08-10 19:53:47,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=724620.0, ans=0.0 2024-08-10 19:54:34,066 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 0, loss[loss=0.08479, beats_loss=0.01601, ecapa_loss=0.0001693, whisper_loss=0.06709, over 17261.00 frames. ], tot_loss[loss=0.08479, beats_loss=0.01601, ecapa_loss=0.0001693, whisper_loss=0.06709, over 17261.00 frames. ], batch size: 68, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 19:54:34,067 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 19:55:10,701 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on ASR_libri: loss=0.2614, beats_loss=0, ecapa_loss=0.0007237, whisper_loss=0.2541, over 922467.00 frames. 2024-08-10 19:55:26,767 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on SV_voxceleb1: loss=0.006205, beats_loss=0, ecapa_loss=0.0006205, whisper_loss=0, over 939242.00 frames. 2024-08-10 19:57:12,758 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on AT_audioset: loss=0.02628, beats_loss=0.02628, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 19:57:12,761 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 19:57:12,962 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 19:57:24,756 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.47 vs. limit=15.0 2024-08-10 19:57:40,551 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 19:57:40,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724750.0, ans=0.1 2024-08-10 19:57:55,337 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 19:58:00,361 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 19:58:15,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=724850.0, ans=0.125 2024-08-10 19:58:25,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=724850.0, ans=0.125 2024-08-10 19:58:27,461 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2024-08-10 19:58:38,282 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-10 19:58:40,592 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 3.034e+01 3.419e+01 4.003e+01 7.099e+01, threshold=6.838e+01, percent-clipped=1.0 2024-08-10 19:58:54,592 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 19:59:06,164 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 34 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 19:59:10,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=725050.0, ans=0.2 2024-08-10 19:59:13,060 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 19:59:15,286 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 50, loss[loss=0.1127, beats_loss=0.01103, ecapa_loss=0.0002318, whisper_loss=0.09937, over 21764.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01133, ecapa_loss=0.0002367, whisper_loss=0.09359, over 870345.02 frames. ], batch size: 87, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 19:59:17,308 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 19:59:22,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=725150.0, ans=0.125 2024-08-10 19:59:42,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=725250.0, ans=0.0 2024-08-10 19:59:42,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=725250.0, ans=0.1 2024-08-10 19:59:44,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=725250.0, ans=0.125 2024-08-10 20:00:07,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=725350.0, ans=0.125 2024-08-10 20:00:18,090 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 20:00:26,582 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 20:00:35,314 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 20:00:46,549 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 20:00:55,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.26 vs. limit=15.0 2024-08-10 20:01:00,244 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 20:01:09,108 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 100, loss[loss=0.1002, beats_loss=0.008828, ecapa_loss=0.0002739, whisper_loss=0.08861, over 15060.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01117, ecapa_loss=0.0002323, whisper_loss=0.09451, over 1521014.60 frames. ], batch size: 62, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:01:12,101 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 20:01:53,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=725850.0, ans=0.0 2024-08-10 20:01:57,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=725850.0, ans=0.0 2024-08-10 20:02:22,647 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-10 20:02:23,195 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.31 vs. limit=22.5 2024-08-10 20:02:24,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=725950.0, ans=0.125 2024-08-10 20:02:28,019 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 2.882e+01 3.222e+01 3.754e+01 5.300e+01, threshold=6.444e+01, percent-clipped=0.0 2024-08-10 20:02:58,315 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 150, loss[loss=0.1085, beats_loss=0.009918, ecapa_loss=0.0002262, whisper_loss=0.09634, over 17273.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.0113, ecapa_loss=0.0002304, whisper_loss=0.09482, over 2045436.55 frames. ], batch size: 66, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:02:58,678 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 20:03:27,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=726250.0, ans=0.0 2024-08-10 20:03:36,187 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-10 20:03:48,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=726350.0, ans=0.0 2024-08-10 20:04:07,015 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-10 20:04:08,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=726550.0, ans=0.125 2024-08-10 20:04:13,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=726550.0, ans=0.2 2024-08-10 20:04:13,405 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2024-08-10 20:04:22,743 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 200, loss[loss=0.106, beats_loss=0.01406, ecapa_loss=0.0001954, whisper_loss=0.09002, over 23112.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01115, ecapa_loss=0.0002332, whisper_loss=0.09611, over 2436998.99 frames. ], batch size: 92, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:04:23,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=726650.0, ans=0.125 2024-08-10 20:04:59,030 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 20:05:02,282 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 20:05:19,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.639e+01 2.951e+01 3.334e+01 6.571e+01, threshold=5.903e+01, percent-clipped=1.0 2024-08-10 20:05:27,175 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 20:05:41,621 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 250, loss[loss=0.1119, beats_loss=0.01173, ecapa_loss=0.0002027, whisper_loss=0.09819, over 23249.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01126, ecapa_loss=0.0002303, whisper_loss=0.09614, over 2763810.26 frames. ], batch size: 91, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:05:49,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=727150.0, ans=15.0 2024-08-10 20:06:04,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2024-08-10 20:06:08,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=727250.0, ans=0.0 2024-08-10 20:06:45,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=727550.0, ans=0.07 2024-08-10 20:06:52,459 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 20:06:53,895 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 300, loss[loss=0.08514, beats_loss=0.01155, ecapa_loss=0.0002414, whisper_loss=0.07117, over 13834.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01135, ecapa_loss=0.0002293, whisper_loss=0.09553, over 2980732.24 frames. ], batch size: 58, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:06:56,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=727650.0, ans=0.035 2024-08-10 20:07:05,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=727650.0, ans=0.0 2024-08-10 20:07:28,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=727850.0, ans=0.125 2024-08-10 20:07:32,133 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 6 from Vox, 33 fro AS 2024-08-10 20:07:38,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=727950.0, ans=0.125 2024-08-10 20:07:40,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=727950.0, ans=0.125 2024-08-10 20:07:41,198 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 33 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 20:07:41,638 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.223e+00 2024-08-10 20:07:45,456 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.770e+01 3.156e+01 3.793e+01 6.617e+01, threshold=6.313e+01, percent-clipped=1.0 2024-08-10 20:07:53,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=728050.0, ans=0.125 2024-08-10 20:07:56,242 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 20:08:02,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=728050.0, ans=0.1 2024-08-10 20:08:05,023 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 20:08:06,421 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 20:08:07,934 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 350, loss[loss=0.1026, beats_loss=0.01261, ecapa_loss=0.0002206, whisper_loss=0.08783, over 19400.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0114, ecapa_loss=0.0002272, whisper_loss=0.09447, over 3156028.53 frames. ], batch size: 75, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:08:09,886 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 20:09:01,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=728450.0, ans=0.125 2024-08-10 20:09:10,025 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 20:09:21,141 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 400, loss[loss=0.0973, beats_loss=0.01406, ecapa_loss=0.0001872, whisper_loss=0.08136, over 14897.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01142, ecapa_loss=0.0002262, whisper_loss=0.09416, over 3305871.17 frames. ], batch size: 56, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:09:21,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=728650.0, ans=0.0 2024-08-10 20:09:41,698 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 33 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 20:09:41,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728750.0, ans=0.1 2024-08-10 20:10:08,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=728950.0, ans=0.0 2024-08-10 20:10:12,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.814e+01 3.145e+01 3.714e+01 1.358e+02, threshold=6.291e+01, percent-clipped=2.0 2024-08-10 20:10:20,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=729050.0, ans=0.125 2024-08-10 20:10:22,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=729050.0, ans=22.5 2024-08-10 20:10:26,013 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.151e-01 2024-08-10 20:10:26,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=729050.0, ans=0.125 2024-08-10 20:10:27,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=729050.0, ans=0.125 2024-08-10 20:10:28,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=729050.0, ans=0.125 2024-08-10 20:10:29,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=729050.0, ans=0.0 2024-08-10 20:10:33,523 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 450, loss[loss=0.1, beats_loss=0.01201, ecapa_loss=0.0002271, whisper_loss=0.08575, over 15968.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01147, ecapa_loss=0.0002258, whisper_loss=0.09423, over 3438618.65 frames. ], batch size: 62, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:10:44,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=729150.0, ans=0.0 2024-08-10 20:10:47,701 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2024-08-10 20:11:04,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=729350.0, ans=0.2 2024-08-10 20:11:09,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=729350.0, ans=0.125 2024-08-10 20:11:17,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=729450.0, ans=0.125 2024-08-10 20:11:20,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=729450.0, ans=0.125 2024-08-10 20:11:33,047 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-10 20:11:47,380 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 500, loss[loss=0.1179, beats_loss=0.01147, ecapa_loss=0.0001933, whisper_loss=0.1045, over 15446.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01142, ecapa_loss=0.0002251, whisper_loss=0.09445, over 3529613.99 frames. ], batch size: 59, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:11:56,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=729650.0, ans=0.0 2024-08-10 20:11:57,560 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 15 from LS+wenet, 34 from Vox, 27 fro AS 2024-08-10 20:11:59,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=729650.0, ans=15.0 2024-08-10 20:12:10,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=729750.0, ans=0.125 2024-08-10 20:12:32,392 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 20:12:32,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=729950.0, ans=0.0 2024-08-10 20:12:41,811 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.724e+01 3.066e+01 3.405e+01 6.797e+01, threshold=6.131e+01, percent-clipped=1.0 2024-08-10 20:12:43,273 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 20:12:56,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=730050.0, ans=0.09899494936611666 2024-08-10 20:13:00,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=730050.0, ans=0.125 2024-08-10 20:13:02,850 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 550, loss[loss=0.099, beats_loss=0.01386, ecapa_loss=0.0001573, whisper_loss=0.08357, over 17648.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01144, ecapa_loss=0.0002241, whisper_loss=0.09426, over 3595773.80 frames. ], batch size: 68, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:13:09,388 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 20:13:09,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2024-08-10 20:13:13,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.28 vs. limit=22.5 2024-08-10 20:13:21,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=730250.0, ans=15.0 2024-08-10 20:13:37,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=730350.0, ans=0.0 2024-08-10 20:13:39,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=730350.0, ans=0.1 2024-08-10 20:13:43,647 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 20:14:06,074 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 20:14:22,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=730550.0, ans=0.025 2024-08-10 20:14:33,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730550.0, ans=0.1 2024-08-10 20:14:34,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=730550.0, ans=0.0 2024-08-10 20:14:39,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=730550.0, ans=0.0 2024-08-10 20:14:41,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=730650.0, ans=0.07 2024-08-10 20:14:42,494 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 600, loss[loss=0.07565, beats_loss=0.01228, ecapa_loss=0.0002577, whisper_loss=0.0608, over 17585.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01145, ecapa_loss=0.0002236, whisper_loss=0.09459, over 3650758.27 frames. ], batch size: 72, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:15:19,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=730850.0, ans=0.125 2024-08-10 20:15:19,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=730850.0, ans=0.5 2024-08-10 20:15:20,093 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.31 vs. limit=10.0 2024-08-10 20:15:29,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=730950.0, ans=6.0 2024-08-10 20:15:37,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.552e+01 2.834e+01 3.243e+01 4.859e+01, threshold=5.668e+01, percent-clipped=0.0 2024-08-10 20:15:56,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=731050.0, ans=0.0 2024-08-10 20:16:08,025 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 650, loss[loss=0.1156, beats_loss=0.01249, ecapa_loss=0.0002028, whisper_loss=0.1011, over 16354.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01151, ecapa_loss=0.0002225, whisper_loss=0.09416, over 3676752.54 frames. ], batch size: 64, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:16:15,463 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=15.0 2024-08-10 20:16:19,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=731150.0, ans=0.125 2024-08-10 20:16:42,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=731250.0, ans=0.0 2024-08-10 20:16:55,795 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-10 20:17:23,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=731450.0, ans=0.125 2024-08-10 20:17:50,841 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 700, loss[loss=0.11, beats_loss=0.01225, ecapa_loss=0.0002219, whisper_loss=0.09553, over 18183.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01139, ecapa_loss=0.0002229, whisper_loss=0.09469, over 3695970.22 frames. ], batch size: 71, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:17:59,550 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.18 vs. limit=15.0 2024-08-10 20:18:10,352 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.55 vs. limit=15.0 2024-08-10 20:18:25,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=731750.0, ans=0.125 2024-08-10 20:18:38,051 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 20:18:58,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=731850.0, ans=0.125 2024-08-10 20:19:08,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=731950.0, ans=0.07 2024-08-10 20:19:15,602 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.666e+01 3.015e+01 3.385e+01 4.873e+01, threshold=6.030e+01, percent-clipped=0.0 2024-08-10 20:19:17,288 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 20:19:17,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=731950.0, ans=0.125 2024-08-10 20:19:33,033 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 20:19:47,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=732150.0, ans=0.02 2024-08-10 20:19:49,638 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 750, loss[loss=0.1232, beats_loss=0.01058, ecapa_loss=0.0002226, whisper_loss=0.1104, over 15896.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01147, ecapa_loss=0.0002205, whisper_loss=0.0939, over 3711749.86 frames. ], batch size: 62, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:20:27,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=732250.0, ans=0.1 2024-08-10 20:20:44,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.76 vs. limit=15.0 2024-08-10 20:20:48,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732350.0, ans=0.1 2024-08-10 20:20:55,561 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 20:21:10,253 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 20:21:20,918 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 20:21:28,822 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 20:21:47,125 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-10 20:21:48,268 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 800, loss[loss=0.08575, beats_loss=0.01454, ecapa_loss=0.000232, whisper_loss=0.0689, over 21577.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01149, ecapa_loss=0.0002211, whisper_loss=0.0939, over 3753622.56 frames. ], batch size: 90, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:21:55,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=732650.0, ans=0.125 2024-08-10 20:22:00,660 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 20:22:08,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=732650.0, ans=0.1 2024-08-10 20:22:12,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=732750.0, ans=0.125 2024-08-10 20:22:14,903 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.08 vs. limit=22.5 2024-08-10 20:22:24,875 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 20:22:26,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=732750.0, ans=0.125 2024-08-10 20:23:13,819 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.807e+01 3.275e+01 3.755e+01 8.468e+01, threshold=6.551e+01, percent-clipped=2.0 2024-08-10 20:23:18,015 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 20:23:18,557 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2024-08-10 20:23:28,159 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 20:23:33,895 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=12.0 2024-08-10 20:23:43,171 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 850, loss[loss=0.1079, beats_loss=0.01337, ecapa_loss=0.0001627, whisper_loss=0.09287, over 14934.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01145, ecapa_loss=0.000219, whisper_loss=0.09434, over 3806559.65 frames. ], batch size: 56, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:23:51,642 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 20:24:03,704 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-10 20:24:03,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=733250.0, ans=0.125 2024-08-10 20:24:33,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=733450.0, ans=0.0 2024-08-10 20:24:35,048 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 20:24:35,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=733450.0, ans=0.1 2024-08-10 20:24:36,353 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 20:24:36,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=733450.0, ans=0.125 2024-08-10 20:24:40,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=733450.0, ans=0.0 2024-08-10 20:24:55,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=733550.0, ans=0.125 2024-08-10 20:24:55,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=733550.0, ans=0.125 2024-08-10 20:25:00,454 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2024-08-10 20:25:02,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=733550.0, ans=0.0 2024-08-10 20:25:09,187 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 900, loss[loss=0.1159, beats_loss=0.01029, ecapa_loss=0.0001765, whisper_loss=0.1038, over 16617.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01148, ecapa_loss=0.0002172, whisper_loss=0.09425, over 3790792.52 frames. ], batch size: 60, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:25:21,139 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 20:25:25,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=733650.0, ans=0.07 2024-08-10 20:25:29,821 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=15.0 2024-08-10 20:25:34,205 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 20:25:46,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=733850.0, ans=0.0 2024-08-10 20:25:56,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=733850.0, ans=0.125 2024-08-10 20:26:06,876 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-10 20:26:12,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.731e+01 3.012e+01 3.536e+01 7.102e+01, threshold=6.024e+01, percent-clipped=1.0 2024-08-10 20:26:17,079 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 20:26:26,379 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 20:26:32,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=734050.0, ans=0.125 2024-08-10 20:26:38,391 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 950, loss[loss=0.08467, beats_loss=0.01483, ecapa_loss=0.0002038, whisper_loss=0.0678, over 22062.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01153, ecapa_loss=0.0002165, whisper_loss=0.09406, over 3801922.41 frames. ], batch size: 92, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:26:49,544 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.78 vs. limit=6.0 2024-08-10 20:27:01,165 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 20:27:04,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=734250.0, ans=0.125 2024-08-10 20:27:42,021 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 20:27:42,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=734450.0, ans=0.125 2024-08-10 20:27:43,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2024-08-10 20:27:49,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2024-08-10 20:27:55,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=734550.0, ans=0.125 2024-08-10 20:27:57,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=734550.0, ans=0.1 2024-08-10 20:28:01,509 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1000, loss[loss=0.1192, beats_loss=0.007785, ecapa_loss=0.0002757, whisper_loss=0.1086, over 14504.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01147, ecapa_loss=0.0002152, whisper_loss=0.09415, over 3785221.57 frames. ], batch size: 58, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:28:03,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=734650.0, ans=0.125 2024-08-10 20:28:05,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=734650.0, ans=0.0 2024-08-10 20:28:33,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=734850.0, ans=0.025 2024-08-10 20:28:34,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=734850.0, ans=0.1 2024-08-10 20:28:48,869 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 20:28:50,496 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 24 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 20:28:50,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=734950.0, ans=0.2 2024-08-10 20:29:00,004 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.726e+01 3.092e+01 3.601e+01 1.041e+02, threshold=6.184e+01, percent-clipped=1.0 2024-08-10 20:29:07,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=735050.0, ans=0.0 2024-08-10 20:29:08,169 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-10 20:29:21,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=735050.0, ans=0.125 2024-08-10 20:29:25,683 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1050, loss[loss=0.1196, beats_loss=0.01119, ecapa_loss=0.000231, whisper_loss=0.1061, over 16354.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01149, ecapa_loss=0.0002168, whisper_loss=0.09421, over 3784344.48 frames. ], batch size: 63, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:29:28,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2024-08-10 20:29:56,218 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 20:30:11,886 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 20:30:15,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=735350.0, ans=0.125 2024-08-10 20:30:20,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=735450.0, ans=0.125 2024-08-10 20:30:29,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=735450.0, ans=0.125 2024-08-10 20:30:35,763 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 20:30:44,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=735550.0, ans=0.125 2024-08-10 20:30:48,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=735550.0, ans=0.05 2024-08-10 20:30:50,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=735650.0, ans=0.1 2024-08-10 20:30:51,903 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1100, loss[loss=0.1215, beats_loss=0.007307, ecapa_loss=0.0002479, whisper_loss=0.1117, over 22736.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01142, ecapa_loss=0.0002179, whisper_loss=0.09529, over 3813151.09 frames. ], batch size: 90, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:30:52,054 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 20:31:03,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.31 vs. limit=10.0 2024-08-10 20:31:22,180 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-10 20:31:22,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=735750.0, ans=0.125 2024-08-10 20:31:28,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=735850.0, ans=0.125 2024-08-10 20:31:40,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=735950.0, ans=0.125 2024-08-10 20:31:50,786 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.214e+01 2.770e+01 3.006e+01 3.661e+01 6.910e+01, threshold=6.012e+01, percent-clipped=1.0 2024-08-10 20:31:59,994 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-10 20:32:03,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=736050.0, ans=10.0 2024-08-10 20:32:15,949 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1150, loss[loss=0.1084, beats_loss=0.0131, ecapa_loss=0.0001949, whisper_loss=0.09333, over 19645.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01143, ecapa_loss=0.0002179, whisper_loss=0.0951, over 3789998.74 frames. ], batch size: 79, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:32:20,331 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.37 vs. limit=15.0 2024-08-10 20:32:32,725 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 20:33:06,455 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 23 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-10 20:33:30,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=736550.0, ans=0.0 2024-08-10 20:33:38,505 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 20:33:39,017 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2024-08-10 20:33:39,682 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1200, loss[loss=0.1034, beats_loss=0.01148, ecapa_loss=0.0002124, whisper_loss=0.0898, over 20459.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01145, ecapa_loss=0.000217, whisper_loss=0.09556, over 3826148.03 frames. ], batch size: 80, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:33:43,880 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2024-08-10 20:34:04,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=736750.0, ans=0.125 2024-08-10 20:34:12,880 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2024-08-10 20:34:15,676 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-10 20:34:33,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.774e+01 3.140e+01 3.554e+01 5.402e+01, threshold=6.279e+01, percent-clipped=0.0 2024-08-10 20:34:42,321 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 20:34:57,081 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1250, loss[loss=0.08865, beats_loss=0.01287, ecapa_loss=0.0002344, whisper_loss=0.07344, over 15911.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01156, ecapa_loss=0.0002153, whisper_loss=0.09416, over 3791071.31 frames. ], batch size: 63, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:35:08,834 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 20:35:10,546 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 20:35:13,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=737250.0, ans=0.125 2024-08-10 20:35:17,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=737250.0, ans=0.0 2024-08-10 20:35:27,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737350.0, ans=0.1 2024-08-10 20:35:32,265 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.48 vs. limit=15.0 2024-08-10 20:35:56,163 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.56 vs. limit=15.0 2024-08-10 20:36:12,435 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1300, loss[loss=0.1056, beats_loss=0.01422, ecapa_loss=0.0001763, whisper_loss=0.08965, over 20433.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0116, ecapa_loss=0.0002148, whisper_loss=0.09453, over 3808319.30 frames. ], batch size: 76, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:36:14,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=737650.0, ans=0.125 2024-08-10 20:36:17,976 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 20:36:27,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=737750.0, ans=0.0 2024-08-10 20:36:29,241 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-08-10 20:36:43,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=737850.0, ans=0.0 2024-08-10 20:36:46,528 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 20:36:47,768 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 20:36:49,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737850.0, ans=0.1 2024-08-10 20:37:00,699 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 20:37:08,880 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.805e+01 3.070e+01 3.591e+01 5.506e+01, threshold=6.140e+01, percent-clipped=0.0 2024-08-10 20:37:32,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=738150.0, ans=0.0 2024-08-10 20:37:34,009 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1350, loss[loss=0.07698, beats_loss=0.01491, ecapa_loss=0.0002019, whisper_loss=0.06005, over 16417.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0116, ecapa_loss=0.0002134, whisper_loss=0.09409, over 3796434.60 frames. ], batch size: 70, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:37:47,230 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 20:37:53,643 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-10 20:38:02,455 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2024-08-10 20:38:11,502 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 20:38:12,405 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2024-08-10 20:38:15,157 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 20:38:20,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=738350.0, ans=0.0 2024-08-10 20:38:39,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=738550.0, ans=0.2 2024-08-10 20:38:53,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=738550.0, ans=0.0 2024-08-10 20:38:56,044 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1400, loss[loss=0.09184, beats_loss=0.01233, ecapa_loss=0.0002037, whisper_loss=0.07747, over 22302.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01156, ecapa_loss=0.0002124, whisper_loss=0.09415, over 3829190.47 frames. ], batch size: 92, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:39:34,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=738850.0, ans=0.125 2024-08-10 20:39:56,924 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.641e+01 2.966e+01 3.393e+01 5.160e+01, threshold=5.932e+01, percent-clipped=0.0 2024-08-10 20:39:58,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738950.0, ans=0.1 2024-08-10 20:40:05,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=739050.0, ans=0.125 2024-08-10 20:40:07,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=739050.0, ans=0.0 2024-08-10 20:40:07,607 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-10 20:40:08,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=739050.0, ans=0.125 2024-08-10 20:40:18,834 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=15.0 2024-08-10 20:40:23,748 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1450, loss[loss=0.09157, beats_loss=0.01217, ecapa_loss=0.0002593, whisper_loss=0.0768, over 17971.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01157, ecapa_loss=0.0002126, whisper_loss=0.09348, over 3830100.70 frames. ], batch size: 74, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:40:25,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739150.0, ans=0.1 2024-08-10 20:41:25,070 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 20:41:26,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=739350.0, ans=0.125 2024-08-10 20:41:43,153 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 20:42:03,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=739550.0, ans=0.0 2024-08-10 20:42:07,387 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 20:42:18,436 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1500, loss[loss=0.09964, beats_loss=0.01054, ecapa_loss=0.0002338, whisper_loss=0.08676, over 23705.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01155, ecapa_loss=0.0002128, whisper_loss=0.09337, over 3819898.21 frames. ], batch size: 93, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:42:32,853 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 13 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 20:42:49,202 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 20:42:59,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=739850.0, ans=0.0 2024-08-10 20:42:59,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739850.0, ans=0.1 2024-08-10 20:43:00,590 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.078e-02 2024-08-10 20:43:06,789 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-10 20:43:07,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=739950.0, ans=0.07 2024-08-10 20:43:13,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=739950.0, ans=0.125 2024-08-10 20:43:14,426 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.729e+01 3.073e+01 3.413e+01 6.253e+01, threshold=6.146e+01, percent-clipped=1.0 2024-08-10 20:43:30,998 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2024-08-10 20:43:38,789 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1550, loss[loss=0.1088, beats_loss=0.009559, ecapa_loss=0.0002363, whisper_loss=0.0969, over 17283.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01146, ecapa_loss=0.0002127, whisper_loss=0.0936, over 3796198.46 frames. ], batch size: 67, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:43:41,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=740150.0, ans=0.125 2024-08-10 20:43:53,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=740250.0, ans=0.0 2024-08-10 20:44:00,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=740250.0, ans=0.2 2024-08-10 20:44:05,979 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=12.0 2024-08-10 20:44:12,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=740350.0, ans=0.2 2024-08-10 20:44:15,811 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 20:44:48,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=740550.0, ans=0.125 2024-08-10 20:44:52,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=740550.0, ans=0.125 2024-08-10 20:45:00,487 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1600, loss[loss=0.1085, beats_loss=0.009126, ecapa_loss=0.0002334, whisper_loss=0.09699, over 18287.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01147, ecapa_loss=0.0002111, whisper_loss=0.09363, over 3798982.62 frames. ], batch size: 72, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:45:03,323 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-10 20:45:07,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=740650.0, ans=0.0 2024-08-10 20:45:08,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=740650.0, ans=0.125 2024-08-10 20:45:12,401 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-08-10 20:45:15,037 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 20:45:20,215 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 12 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 20:45:21,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=740750.0, ans=0.2 2024-08-10 20:45:58,563 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.574e+01 2.929e+01 3.457e+01 5.264e+01, threshold=5.858e+01, percent-clipped=0.0 2024-08-10 20:46:05,757 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-10 20:46:23,213 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1650, loss[loss=0.1068, beats_loss=0.01071, ecapa_loss=0.000241, whisper_loss=0.09368, over 17645.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01147, ecapa_loss=0.0002114, whisper_loss=0.09371, over 3799932.22 frames. ], batch size: 72, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:46:26,709 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-10 20:46:31,150 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 20:46:34,770 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-10 20:46:47,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=741250.0, ans=0.125 2024-08-10 20:46:50,803 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 20:47:15,633 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-10 20:47:19,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=741450.0, ans=0.125 2024-08-10 20:47:28,953 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 20:47:40,725 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1700, loss[loss=0.1179, beats_loss=0.00998, ecapa_loss=0.0002341, whisper_loss=0.1056, over 13960.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01146, ecapa_loss=0.000214, whisper_loss=0.094, over 3812290.83 frames. ], batch size: 55, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:47:49,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=741650.0, ans=0.0 2024-08-10 20:47:52,060 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 20:48:10,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=741850.0, ans=0.125 2024-08-10 20:48:11,746 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 20:48:34,133 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.737e+01 3.042e+01 3.583e+01 5.597e+01, threshold=6.084e+01, percent-clipped=0.0 2024-08-10 20:48:37,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=741950.0, ans=0.2 2024-08-10 20:48:37,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=741950.0, ans=0.0 2024-08-10 20:48:56,006 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1750, loss[loss=0.09909, beats_loss=0.0118, ecapa_loss=0.0002227, whisper_loss=0.08507, over 16980.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01144, ecapa_loss=0.000214, whisper_loss=0.09411, over 3798650.13 frames. ], batch size: 66, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:49:07,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=742150.0, ans=0.07 2024-08-10 20:49:20,591 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2024-08-10 20:49:22,564 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 20:49:22,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=742250.0, ans=0.0 2024-08-10 20:49:35,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=742350.0, ans=0.2 2024-08-10 20:49:41,406 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 20:50:11,704 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1800, loss[loss=0.09728, beats_loss=0.008933, ecapa_loss=0.0002411, whisper_loss=0.08593, over 14757.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01138, ecapa_loss=0.0002145, whisper_loss=0.09394, over 3812292.96 frames. ], batch size: 60, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:50:12,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=742650.0, ans=12.0 2024-08-10 20:50:26,888 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 20:50:48,645 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 12 from Vox, 45 fro AS 2024-08-10 20:51:05,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.668e+01 3.016e+01 3.512e+01 6.004e+01, threshold=6.033e+01, percent-clipped=0.0 2024-08-10 20:51:13,863 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 20:51:22,189 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 20:51:29,692 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1850, loss[loss=0.11, beats_loss=0.01296, ecapa_loss=0.0002125, whisper_loss=0.09491, over 20114.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01147, ecapa_loss=0.0002132, whisper_loss=0.09299, over 3814699.77 frames. ], batch size: 78, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:51:33,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=743150.0, ans=0.125 2024-08-10 20:51:52,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=743250.0, ans=0.0 2024-08-10 20:51:55,377 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2024-08-10 20:52:02,721 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=12.0 2024-08-10 20:52:05,323 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 20:52:44,290 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1900, loss[loss=0.102, beats_loss=0.01415, ecapa_loss=0.0001839, whisper_loss=0.08605, over 19384.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0115, ecapa_loss=0.0002161, whisper_loss=0.09275, over 3823011.36 frames. ], batch size: 76, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:52:47,604 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 20:52:53,447 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 20:52:53,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=743650.0, ans=0.125 2024-08-10 20:52:59,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=743750.0, ans=0.125 2024-08-10 20:53:06,338 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 16 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 20:53:11,944 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 20:53:19,112 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-10 20:53:36,519 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.690e+01 3.145e+01 3.666e+01 6.863e+01, threshold=6.290e+01, percent-clipped=1.0 2024-08-10 20:53:40,145 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 15 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 20:53:44,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-10 20:54:01,664 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 1950, loss[loss=0.09754, beats_loss=0.01432, ecapa_loss=0.0001869, whisper_loss=0.08135, over 21503.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0116, ecapa_loss=0.0002175, whisper_loss=0.09244, over 3811855.03 frames. ], batch size: 88, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:54:01,776 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 20:54:21,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744250.0, ans=0.1 2024-08-10 20:54:21,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=744250.0, ans=0.125 2024-08-10 20:54:24,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=744250.0, ans=0.0 2024-08-10 20:54:27,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=744250.0, ans=0.0 2024-08-10 20:54:36,092 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=7.80 vs. limit=12.0 2024-08-10 20:55:18,449 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2000, loss[loss=0.1139, beats_loss=0.01258, ecapa_loss=0.0002261, whisper_loss=0.09908, over 22841.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01172, ecapa_loss=0.0002188, whisper_loss=0.09178, over 3828014.45 frames. ], batch size: 92, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:55:31,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=744650.0, ans=0.2 2024-08-10 20:55:41,639 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 19 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 20:55:41,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=744750.0, ans=0.0 2024-08-10 20:55:55,426 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 20:55:56,645 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 20:56:03,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=744850.0, ans=0.125 2024-08-10 20:56:06,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=744950.0, ans=0.5 2024-08-10 20:56:08,111 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 20:56:16,447 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.753e+01 3.103e+01 3.441e+01 5.353e+01, threshold=6.205e+01, percent-clipped=0.0 2024-08-10 20:56:30,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745050.0, ans=0.1 2024-08-10 20:56:33,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=12.0 2024-08-10 20:56:42,161 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2050, loss[loss=0.07188, beats_loss=0.01575, ecapa_loss=0.0002003, whisper_loss=0.05413, over 17506.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0117, ecapa_loss=0.0002205, whisper_loss=0.0921, over 3821357.14 frames. ], batch size: 76, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:57:05,673 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 20:57:41,603 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-08-10 20:57:46,178 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 20:58:02,665 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2100, loss[loss=0.1195, beats_loss=0.01021, ecapa_loss=0.0002565, whisper_loss=0.1067, over 16365.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01178, ecapa_loss=0.0002207, whisper_loss=0.09211, over 3836102.85 frames. ], batch size: 70, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:58:07,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.56 vs. limit=12.0 2024-08-10 20:58:20,838 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=12.0 2024-08-10 20:58:32,952 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 20:58:36,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.35 vs. limit=6.0 2024-08-10 20:58:47,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745850.0, ans=0.1 2024-08-10 20:58:55,385 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.34 vs. limit=22.5 2024-08-10 20:58:59,772 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.16 vs. limit=22.5 2024-08-10 20:59:03,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=745950.0, ans=0.2 2024-08-10 20:59:06,535 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 2.787e+01 3.226e+01 3.870e+01 7.991e+01, threshold=6.452e+01, percent-clipped=3.0 2024-08-10 20:59:15,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=746050.0, ans=0.125 2024-08-10 20:59:19,055 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 27 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 20:59:31,397 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2150, loss[loss=0.1145, beats_loss=0.01145, ecapa_loss=0.0002712, whisper_loss=0.1003, over 21643.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01183, ecapa_loss=0.0002215, whisper_loss=0.0921, over 3836859.59 frames. ], batch size: 89, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:59:34,578 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 20:59:36,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=746150.0, ans=0.2 2024-08-10 20:59:43,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746150.0, ans=0.1 2024-08-10 20:59:45,630 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 20:59:47,687 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 20:59:51,076 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 20:59:53,458 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2024-08-10 21:00:00,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=746250.0, ans=0.04949747468305833 2024-08-10 21:00:16,726 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 21:00:19,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2024-08-10 21:00:22,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=746350.0, ans=0.125 2024-08-10 21:00:25,003 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 16 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 21:00:35,174 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=12.0 2024-08-10 21:00:57,249 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2200, loss[loss=0.1269, beats_loss=0.01087, ecapa_loss=0.0001747, whisper_loss=0.1143, over 21545.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01175, ecapa_loss=0.0002211, whisper_loss=0.09319, over 3833581.34 frames. ], batch size: 77, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:01:00,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746650.0, ans=0.1 2024-08-10 21:01:06,117 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 21:01:19,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=746750.0, ans=0.125 2024-08-10 21:01:27,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=746750.0, ans=0.2 2024-08-10 21:01:27,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=746750.0, ans=0.02 2024-08-10 21:01:59,207 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.667e+01 3.183e+01 3.944e+01 1.052e+02, threshold=6.365e+01, percent-clipped=1.0 2024-08-10 21:02:00,676 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-10 21:02:14,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=747050.0, ans=0.125 2024-08-10 21:02:20,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=747050.0, ans=0.2 2024-08-10 21:02:24,530 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2250, loss[loss=0.1154, beats_loss=0.01109, ecapa_loss=0.0002083, whisper_loss=0.1023, over 22168.00 frames. ], tot_loss[loss=0.108, beats_loss=0.0117, ecapa_loss=0.0002214, whisper_loss=0.09404, over 3856324.01 frames. ], batch size: 89, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:02:29,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=747150.0, ans=0.2 2024-08-10 21:02:29,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=747150.0, ans=0.0 2024-08-10 21:02:29,525 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2024-08-10 21:02:36,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=747150.0, ans=0.0 2024-08-10 21:02:41,874 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 21:02:48,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=747250.0, ans=0.125 2024-08-10 21:03:01,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=747350.0, ans=0.0 2024-08-10 21:03:19,601 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-10 21:03:33,054 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 21:03:33,531 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-10 21:03:51,107 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2300, loss[loss=0.1163, beats_loss=0.01101, ecapa_loss=0.00023, whisper_loss=0.103, over 22213.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01164, ecapa_loss=0.0002225, whisper_loss=0.09534, over 3894686.84 frames. ], batch size: 89, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:03:51,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=747650.0, ans=0.125 2024-08-10 21:04:09,710 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 21:04:12,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=747750.0, ans=0.125 2024-08-10 21:04:16,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=747750.0, ans=0.2 2024-08-10 21:04:53,105 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.764e+01 3.059e+01 3.552e+01 5.257e+01, threshold=6.118e+01, percent-clipped=0.0 2024-08-10 21:05:01,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=748050.0, ans=0.125 2024-08-10 21:05:19,913 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2350, loss[loss=0.1113, beats_loss=0.009474, ecapa_loss=0.0002314, whisper_loss=0.09949, over 17593.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01161, ecapa_loss=0.0002222, whisper_loss=0.09552, over 3863784.55 frames. ], batch size: 66, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:05:28,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=748150.0, ans=0.0 2024-08-10 21:05:29,728 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 31 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 21:06:05,800 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.69 vs. limit=15.0 2024-08-10 21:06:45,349 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 21:06:54,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=748550.0, ans=0.0 2024-08-10 21:06:59,712 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.420e+01 2024-08-10 21:07:01,548 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 21:07:08,492 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2400, loss[loss=0.1467, beats_loss=0.007337, ecapa_loss=0.000245, whisper_loss=0.1369, over 22681.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01153, ecapa_loss=0.0002236, whisper_loss=0.09591, over 3860373.99 frames. ], batch size: 83, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:07:30,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=748650.0, ans=0.0 2024-08-10 21:07:37,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=748750.0, ans=0.0 2024-08-10 21:07:47,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=748750.0, ans=0.125 2024-08-10 21:07:48,577 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=22.5 2024-08-10 21:07:53,711 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-10 21:08:11,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=748850.0, ans=0.05 2024-08-10 21:08:32,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=748950.0, ans=0.2 2024-08-10 21:08:38,569 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 21:08:45,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=748950.0, ans=0.09899494936611666 2024-08-10 21:08:47,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.712e+01 3.107e+01 3.563e+01 2.420e+02, threshold=6.213e+01, percent-clipped=2.0 2024-08-10 21:09:07,344 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 13 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 21:09:27,497 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 34 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 21:09:29,385 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2450, loss[loss=0.1332, beats_loss=0.01039, ecapa_loss=0.0002107, whisper_loss=0.1207, over 22514.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.0116, ecapa_loss=0.000223, whisper_loss=0.09563, over 3861615.72 frames. ], batch size: 84, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:09:54,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=749250.0, ans=0.0 2024-08-10 21:09:58,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=749250.0, ans=0.2 2024-08-10 21:10:09,654 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 21:10:20,249 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 21:10:25,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=749350.0, ans=0.1 2024-08-10 21:10:37,873 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-08-10 21:10:52,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=749550.0, ans=0.0 2024-08-10 21:11:01,094 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2500, loss[loss=0.118, beats_loss=0.009795, ecapa_loss=0.0002471, whisper_loss=0.1057, over 22369.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01147, ecapa_loss=0.0002236, whisper_loss=0.09634, over 3879992.87 frames. ], batch size: 88, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:11:04,898 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 21:11:14,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=749650.0, ans=0.0 2024-08-10 21:11:20,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=749750.0, ans=0.125 2024-08-10 21:11:23,957 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 21:11:29,817 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 21:11:47,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=749850.0, ans=0.125 2024-08-10 21:11:52,926 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.61 vs. limit=22.5 2024-08-10 21:12:03,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+01 2.786e+01 3.132e+01 3.631e+01 5.389e+01, threshold=6.264e+01, percent-clipped=0.0 2024-08-10 21:12:32,544 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2550, loss[loss=0.1044, beats_loss=0.01459, ecapa_loss=0.0002133, whisper_loss=0.08768, over 23473.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01144, ecapa_loss=0.0002245, whisper_loss=0.09622, over 3883482.60 frames. ], batch size: 94, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:12:33,359 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2024-08-10 21:12:34,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=750150.0, ans=0.2 2024-08-10 21:12:36,467 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2024-08-10 21:12:44,365 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 21:12:49,921 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-10 21:13:15,074 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 21:13:32,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=750450.0, ans=0.125 2024-08-10 21:13:33,757 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 21:13:49,634 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 21:14:07,929 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2600, loss[loss=0.08663, beats_loss=0.01308, ecapa_loss=0.000199, whisper_loss=0.07157, over 21429.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01152, ecapa_loss=0.0002245, whisper_loss=0.09556, over 3880970.23 frames. ], batch size: 86, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:14:08,131 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 21:14:09,258 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 21:14:27,899 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 21:14:32,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=750750.0, ans=0.125 2024-08-10 21:14:52,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=750850.0, ans=0.125 2024-08-10 21:15:01,783 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-10 21:15:09,722 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.826e+01 3.235e+01 3.900e+01 8.164e+01, threshold=6.470e+01, percent-clipped=1.0 2024-08-10 21:15:15,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=750950.0, ans=0.07 2024-08-10 21:15:16,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=751050.0, ans=0.2 2024-08-10 21:15:28,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=751050.0, ans=0.2 2024-08-10 21:15:33,788 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2650, loss[loss=0.1171, beats_loss=0.0114, ecapa_loss=0.0001841, whisper_loss=0.1038, over 21007.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01156, ecapa_loss=0.0002251, whisper_loss=0.09477, over 3848344.01 frames. ], batch size: 77, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:15:43,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751150.0, ans=0.1 2024-08-10 21:15:43,412 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.05 vs. limit=12.0 2024-08-10 21:15:44,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=751150.0, ans=0.1 2024-08-10 21:15:47,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=751150.0, ans=0.1 2024-08-10 21:15:58,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751250.0, ans=0.1 2024-08-10 21:16:49,351 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 21:16:57,704 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=12.0 2024-08-10 21:17:02,387 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2700, loss[loss=0.1198, beats_loss=0.01024, ecapa_loss=0.0002357, whisper_loss=0.1072, over 18386.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01151, ecapa_loss=0.0002241, whisper_loss=0.09548, over 3884531.03 frames. ], batch size: 72, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:17:07,682 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 21:17:09,366 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 26 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-10 21:17:41,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=751850.0, ans=0.2 2024-08-10 21:17:46,297 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 21:17:48,348 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-10 21:17:57,636 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 21:18:01,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=751950.0, ans=0.125 2024-08-10 21:18:03,075 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 3.016e+01 3.341e+01 3.971e+01 1.144e+02, threshold=6.682e+01, percent-clipped=3.0 2024-08-10 21:18:07,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=751950.0, ans=0.1 2024-08-10 21:18:10,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.23 vs. limit=15.0 2024-08-10 21:18:13,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-10 21:18:16,460 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 21:18:21,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=752050.0, ans=0.07 2024-08-10 21:18:28,264 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2750, loss[loss=0.1006, beats_loss=0.01026, ecapa_loss=0.0002432, whisper_loss=0.08789, over 15624.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01152, ecapa_loss=0.000224, whisper_loss=0.09588, over 3874143.42 frames. ], batch size: 65, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:18:30,759 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2024-08-10 21:18:38,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=752150.0, ans=0.0 2024-08-10 21:18:57,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=752250.0, ans=0.2 2024-08-10 21:18:59,002 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 18 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 21:19:15,887 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 21:19:33,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=752450.0, ans=0.125 2024-08-10 21:19:54,843 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2800, loss[loss=0.1016, beats_loss=0.01013, ecapa_loss=0.0002496, whisper_loss=0.08896, over 15867.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.0115, ecapa_loss=0.0002246, whisper_loss=0.09643, over 3884784.12 frames. ], batch size: 66, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:19:55,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=752650.0, ans=0.07 2024-08-10 21:20:00,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=752650.0, ans=0.125 2024-08-10 21:20:03,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=752650.0, ans=0.05 2024-08-10 21:20:03,636 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2024-08-10 21:20:15,219 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 30 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 21:20:16,836 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 21:20:27,056 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 21:20:27,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=752850.0, ans=0.125 2024-08-10 21:20:30,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=752850.0, ans=0.1 2024-08-10 21:20:46,346 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-10 21:20:53,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+01 2.721e+01 3.078e+01 3.353e+01 6.515e+01, threshold=6.156e+01, percent-clipped=0.0 2024-08-10 21:21:20,792 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2850, loss[loss=0.1023, beats_loss=0.01258, ecapa_loss=0.0002653, whisper_loss=0.08708, over 21043.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.0115, ecapa_loss=0.0002239, whisper_loss=0.09644, over 3896019.75 frames. ], batch size: 90, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:21:46,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=753250.0, ans=0.0 2024-08-10 21:21:46,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=753250.0, ans=0.0 2024-08-10 21:22:15,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=753350.0, ans=0.125 2024-08-10 21:22:23,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=753450.0, ans=0.04949747468305833 2024-08-10 21:22:29,427 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2024-08-10 21:22:30,615 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 21:22:40,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=753550.0, ans=0.125 2024-08-10 21:22:48,216 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 21:22:53,069 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2900, loss[loss=0.1102, beats_loss=0.01049, ecapa_loss=0.0002527, whisper_loss=0.09719, over 18164.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01153, ecapa_loss=0.0002256, whisper_loss=0.09595, over 3885139.17 frames. ], batch size: 75, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:23:13,765 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2024-08-10 21:23:31,372 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=8.189e-02 2024-08-10 21:23:53,383 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.758e+01 3.070e+01 3.678e+01 5.521e+01, threshold=6.141e+01, percent-clipped=0.0 2024-08-10 21:23:53,639 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 31 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 21:24:18,674 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 2950, loss[loss=0.1255, beats_loss=0.01169, ecapa_loss=0.0002486, whisper_loss=0.1113, over 19248.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.0115, ecapa_loss=0.000227, whisper_loss=0.09556, over 3891171.19 frames. ], batch size: 74, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:24:44,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=754250.0, ans=0.125 2024-08-10 21:24:44,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=754250.0, ans=0.125 2024-08-10 21:24:44,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=754250.0, ans=0.1 2024-08-10 21:25:02,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=754350.0, ans=0.125 2024-08-10 21:25:03,984 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 30 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 21:25:34,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=754550.0, ans=0.125 2024-08-10 21:25:36,316 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 21:25:39,061 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3000, loss[loss=0.1102, beats_loss=0.01264, ecapa_loss=0.0002176, whisper_loss=0.09536, over 18446.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01156, ecapa_loss=0.0002261, whisper_loss=0.09546, over 3890368.10 frames. ], batch size: 73, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:25:39,062 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 21:26:18,892 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on ASR_libri: loss=0.2598, beats_loss=0, ecapa_loss=0.0007066, whisper_loss=0.2527, over 922467.00 frames. 2024-08-10 21:26:36,963 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8803, 2.8675, 3.1690, 1.6647, 1.7650, 2.7813, 3.2985, 3.0483], device='cuda:1') 2024-08-10 21:26:38,492 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on SV_voxceleb1: loss=0.005938, beats_loss=0, ecapa_loss=0.0005938, whisper_loss=0, over 939242.00 frames. 2024-08-10 21:27:59,385 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8436, 4.1378, 2.5676, 4.5970], device='cuda:1') 2024-08-10 21:28:35,342 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.7673, 1.4604, 1.5797, 1.6943], device='cuda:1') 2024-08-10 21:28:42,300 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on AT_audioset: loss=0.02614, beats_loss=0.02614, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 21:28:42,303 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 21:28:45,022 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 21:29:03,965 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2024-08-10 21:29:15,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=754850.0, ans=0.125 2024-08-10 21:29:16,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=754850.0, ans=0.125 2024-08-10 21:29:18,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=754850.0, ans=0.95 2024-08-10 21:29:23,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=754850.0, ans=0.125 2024-08-10 21:29:38,559 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.874e+01 3.287e+01 3.873e+01 6.300e+01, threshold=6.573e+01, percent-clipped=1.0 2024-08-10 21:30:00,942 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-10 21:30:02,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3050, loss[loss=0.1127, beats_loss=0.01452, ecapa_loss=0.0002105, whisper_loss=0.09604, over 21403.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.0116, ecapa_loss=0.0002267, whisper_loss=0.09573, over 3885522.89 frames. ], batch size: 87, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:30:15,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=755150.0, ans=0.0 2024-08-10 21:30:30,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=755250.0, ans=0.0 2024-08-10 21:30:36,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=755350.0, ans=0.125 2024-08-10 21:30:46,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=755350.0, ans=0.125 2024-08-10 21:31:14,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=755550.0, ans=0.125 2024-08-10 21:31:22,985 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3100, loss[loss=0.1213, beats_loss=0.0101, ecapa_loss=0.0002069, whisper_loss=0.1091, over 15834.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.0116, ecapa_loss=0.0002257, whisper_loss=0.09563, over 3890300.63 frames. ], batch size: 60, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:31:36,800 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 21:31:37,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=755650.0, ans=0.2 2024-08-10 21:31:45,908 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 21:31:52,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=755750.0, ans=0.0 2024-08-10 21:31:59,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=755850.0, ans=0.125 2024-08-10 21:32:08,183 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=22.5 2024-08-10 21:32:11,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755950.0, ans=0.1 2024-08-10 21:32:13,182 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 21:32:21,338 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.626e+01 2.939e+01 3.498e+01 4.571e+01, threshold=5.879e+01, percent-clipped=0.0 2024-08-10 21:32:28,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=756050.0, ans=0.125 2024-08-10 21:32:34,347 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 21:32:41,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=756050.0, ans=0.2 2024-08-10 21:32:44,802 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3150, loss[loss=0.1153, beats_loss=0.01046, ecapa_loss=0.0002251, whisper_loss=0.1026, over 22870.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01163, ecapa_loss=0.0002236, whisper_loss=0.09602, over 3916412.16 frames. ], batch size: 93, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:32:51,131 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 21:33:01,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=756250.0, ans=0.125 2024-08-10 21:33:02,077 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.57 vs. limit=22.5 2024-08-10 21:33:17,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=756350.0, ans=0.125 2024-08-10 21:33:19,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=756350.0, ans=0.125 2024-08-10 21:33:50,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=756550.0, ans=10.0 2024-08-10 21:33:53,165 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 21:33:56,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=756550.0, ans=0.125 2024-08-10 21:33:59,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=756550.0, ans=0.1 2024-08-10 21:34:04,961 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3200, loss[loss=0.1071, beats_loss=0.01224, ecapa_loss=0.0002258, whisper_loss=0.09258, over 22576.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01162, ecapa_loss=0.0002233, whisper_loss=0.09615, over 3899807.18 frames. ], batch size: 90, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:34:41,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=756850.0, ans=0.125 2024-08-10 21:34:46,383 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 21:34:47,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=756850.0, ans=0.125 2024-08-10 21:34:47,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=756850.0, ans=0.125 2024-08-10 21:34:49,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=756850.0, ans=0.125 2024-08-10 21:35:03,119 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.765e+01 3.113e+01 3.844e+01 7.476e+01, threshold=6.225e+01, percent-clipped=4.0 2024-08-10 21:35:03,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756950.0, ans=0.1 2024-08-10 21:35:09,339 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-10 21:35:10,638 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 21:35:18,352 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 21:35:26,952 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3250, loss[loss=0.1077, beats_loss=0.01054, ecapa_loss=0.0002332, whisper_loss=0.09481, over 18763.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01161, ecapa_loss=0.0002235, whisper_loss=0.09583, over 3861987.33 frames. ], batch size: 73, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:35:56,083 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 21:36:14,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=757350.0, ans=0.0 2024-08-10 21:36:24,260 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 21:36:27,236 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 21:36:44,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=757550.0, ans=0.1 2024-08-10 21:36:44,395 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2024-08-10 21:36:46,960 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-10 21:36:47,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=757550.0, ans=0.125 2024-08-10 21:36:50,139 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3300, loss[loss=0.1234, beats_loss=0.01002, ecapa_loss=0.0001921, whisper_loss=0.1114, over 24317.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01159, ecapa_loss=0.0002246, whisper_loss=0.09583, over 3871638.19 frames. ], batch size: 93, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:36:53,931 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.95 vs. limit=15.0 2024-08-10 21:37:09,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=757750.0, ans=0.125 2024-08-10 21:37:12,091 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-10 21:37:12,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=757750.0, ans=0.125 2024-08-10 21:37:37,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=757850.0, ans=0.125 2024-08-10 21:37:46,085 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2024-08-10 21:37:50,388 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.783e+01 3.080e+01 3.590e+01 5.176e+01, threshold=6.160e+01, percent-clipped=0.0 2024-08-10 21:38:00,735 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-10 21:38:04,245 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.47 vs. limit=22.5 2024-08-10 21:38:08,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=758050.0, ans=0.125 2024-08-10 21:38:15,666 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3350, loss[loss=0.09307, beats_loss=0.01276, ecapa_loss=0.0001832, whisper_loss=0.07849, over 17516.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01157, ecapa_loss=0.0002244, whisper_loss=0.09483, over 3828148.73 frames. ], batch size: 69, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:38:23,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=758150.0, ans=0.1 2024-08-10 21:38:30,953 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 21:39:02,700 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.59 vs. limit=15.0 2024-08-10 21:39:05,584 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 21:39:28,869 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 21:39:33,422 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3400, loss[loss=0.09022, beats_loss=0.01403, ecapa_loss=0.0002564, whisper_loss=0.07362, over 21607.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01167, ecapa_loss=0.0002226, whisper_loss=0.09459, over 3885293.51 frames. ], batch size: 93, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:39:52,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=758750.0, ans=0.0 2024-08-10 21:40:07,590 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 21:40:25,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=15.0 2024-08-10 21:40:29,143 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 21:40:32,012 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.745e+01 3.132e+01 3.636e+01 5.691e+01, threshold=6.264e+01, percent-clipped=0.0 2024-08-10 21:40:36,814 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 36 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-10 21:40:46,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=759050.0, ans=0.2 2024-08-10 21:40:52,969 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 21:40:53,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=759050.0, ans=0.125 2024-08-10 21:40:54,908 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 21:40:56,024 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3450, loss[loss=0.125, beats_loss=0.009794, ecapa_loss=0.000232, whisper_loss=0.1129, over 22731.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01171, ecapa_loss=0.0002212, whisper_loss=0.09465, over 3914535.47 frames. ], batch size: 89, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:40:58,387 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=15.0 2024-08-10 21:40:59,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=759150.0, ans=0.125 2024-08-10 21:41:06,022 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 21:41:09,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=759150.0, ans=0.125 2024-08-10 21:41:23,693 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 21:41:43,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.08 vs. limit=15.0 2024-08-10 21:41:46,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=759450.0, ans=0.0 2024-08-10 21:41:55,256 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2024-08-10 21:41:59,688 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.46 vs. limit=22.5 2024-08-10 21:42:17,830 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 21:42:19,398 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3500, loss[loss=0.08295, beats_loss=0.01552, ecapa_loss=0.0002173, whisper_loss=0.06526, over 13043.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01177, ecapa_loss=0.0002228, whisper_loss=0.09395, over 3911603.47 frames. ], batch size: 55, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:42:26,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=759650.0, ans=0.125 2024-08-10 21:42:29,892 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 21:42:33,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=759750.0, ans=0.0 2024-08-10 21:42:41,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=759750.0, ans=0.1 2024-08-10 21:42:48,073 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=22.5 2024-08-10 21:43:06,374 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 21:43:11,878 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.666e+01 2.958e+01 3.304e+01 6.870e+01, threshold=5.915e+01, percent-clipped=1.0 2024-08-10 21:43:31,698 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3550, loss[loss=0.1139, beats_loss=0.01284, ecapa_loss=0.0002279, whisper_loss=0.09879, over 22106.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01173, ecapa_loss=0.0002226, whisper_loss=0.094, over 3890785.23 frames. ], batch size: 90, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:43:42,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=760150.0, ans=0.0 2024-08-10 21:43:55,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2024-08-10 21:44:04,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=760350.0, ans=0.125 2024-08-10 21:44:09,980 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-10 21:44:10,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=760450.0, ans=0.0 2024-08-10 21:44:13,885 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 21:44:15,308 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 21:44:18,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=760450.0, ans=0.0 2024-08-10 21:44:19,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=760450.0, ans=0.125 2024-08-10 21:44:27,921 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 21:44:36,917 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3600, loss[loss=0.1049, beats_loss=0.01226, ecapa_loss=0.0002282, whisper_loss=0.09039, over 14164.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01174, ecapa_loss=0.0002231, whisper_loss=0.09434, over 3901758.12 frames. ], batch size: 58, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:44:42,525 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 21:44:50,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=760750.0, ans=0.125 2024-08-10 21:44:57,320 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 21:45:07,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=760850.0, ans=0.125 2024-08-10 21:45:13,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=760850.0, ans=0.125 2024-08-10 21:45:18,463 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 21:45:23,622 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.660e+01 3.011e+01 3.359e+01 4.667e+01, threshold=6.021e+01, percent-clipped=0.0 2024-08-10 21:45:27,832 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 21:45:36,161 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-10 21:45:43,627 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3650, loss[loss=0.1138, beats_loss=0.0116, ecapa_loss=0.0002605, whisper_loss=0.0996, over 16526.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0118, ecapa_loss=0.0002216, whisper_loss=0.09403, over 3882235.83 frames. ], batch size: 68, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:45:43,854 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-10 21:46:00,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=761250.0, ans=0.1 2024-08-10 21:46:01,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=761250.0, ans=0.2 2024-08-10 21:46:03,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=761250.0, ans=0.125 2024-08-10 21:46:14,621 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 21:46:16,932 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2024-08-10 21:46:31,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=761450.0, ans=0.125 2024-08-10 21:46:39,954 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 21:46:43,870 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 21:46:48,727 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3700, loss[loss=0.1196, beats_loss=0.01439, ecapa_loss=0.000169, whisper_loss=0.1035, over 19983.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01171, ecapa_loss=0.0002206, whisper_loss=0.09501, over 3877506.07 frames. ], batch size: 77, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:47:28,842 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 21:47:35,079 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.698e+01 3.015e+01 3.307e+01 5.689e+01, threshold=6.030e+01, percent-clipped=0.0 2024-08-10 21:47:49,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=762050.0, ans=0.125 2024-08-10 21:47:49,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=762050.0, ans=0.2 2024-08-10 21:47:55,182 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3750, loss[loss=0.09159, beats_loss=0.01268, ecapa_loss=0.0002266, whisper_loss=0.07665, over 22144.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01179, ecapa_loss=0.0002217, whisper_loss=0.0943, over 3874027.61 frames. ], batch size: 88, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:47:59,528 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 21:48:07,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=762250.0, ans=0.2 2024-08-10 21:48:15,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=762250.0, ans=0.0 2024-08-10 21:48:31,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=762350.0, ans=0.0 2024-08-10 21:48:37,532 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=12.0 2024-08-10 21:49:01,264 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3800, loss[loss=0.09153, beats_loss=0.01642, ecapa_loss=0.0001946, whisper_loss=0.07317, over 16747.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01182, ecapa_loss=0.0002236, whisper_loss=0.09354, over 3864404.02 frames. ], batch size: 68, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:49:05,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762650.0, ans=0.1 2024-08-10 21:49:47,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.822e+01 3.123e+01 3.627e+01 5.849e+01, threshold=6.246e+01, percent-clipped=0.0 2024-08-10 21:49:50,212 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 21:50:07,015 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3850, loss[loss=0.1266, beats_loss=0.01343, ecapa_loss=0.0001694, whisper_loss=0.1115, over 22653.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01191, ecapa_loss=0.0002228, whisper_loss=0.09328, over 3896625.48 frames. ], batch size: 85, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:50:27,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=763250.0, ans=0.2 2024-08-10 21:50:36,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=763350.0, ans=0.0 2024-08-10 21:50:39,020 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 21:50:46,142 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 21:50:49,895 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 21:50:56,248 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-10 21:51:12,970 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3900, loss[loss=0.09162, beats_loss=0.01251, ecapa_loss=0.0002045, whisper_loss=0.07707, over 15222.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01182, ecapa_loss=0.0002241, whisper_loss=0.09471, over 3940126.60 frames. ], batch size: 58, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:51:13,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=763650.0, ans=0.125 2024-08-10 21:51:17,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=763650.0, ans=0.125 2024-08-10 21:51:18,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=12.0 2024-08-10 21:51:20,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=763650.0, ans=0.2 2024-08-10 21:51:34,982 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 21:51:39,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=763850.0, ans=0.125 2024-08-10 21:51:48,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=763850.0, ans=0.07 2024-08-10 21:51:58,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.868e+01 3.190e+01 3.521e+01 6.195e+01, threshold=6.380e+01, percent-clipped=0.0 2024-08-10 21:52:12,461 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.14 vs. limit=15.0 2024-08-10 21:52:17,744 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 3950, loss[loss=0.1159, beats_loss=0.01042, ecapa_loss=0.0002459, whisper_loss=0.103, over 22464.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0117, ecapa_loss=0.0002266, whisper_loss=0.0945, over 3917519.88 frames. ], batch size: 92, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:52:27,157 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-10 21:52:32,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=764250.0, ans=0.2 2024-08-10 21:53:01,173 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2024-08-10 21:53:01,924 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 21:53:16,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-10 21:53:24,062 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4000, loss[loss=0.1189, beats_loss=0.01098, ecapa_loss=0.0002187, whisper_loss=0.1057, over 17691.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01161, ecapa_loss=0.0002254, whisper_loss=0.09539, over 3902545.79 frames. ], batch size: 68, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:53:25,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=764650.0, ans=0.0 2024-08-10 21:53:26,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=764650.0, ans=0.125 2024-08-10 21:53:29,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=764650.0, ans=0.125 2024-08-10 21:53:29,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=764650.0, ans=0.0 2024-08-10 21:53:32,867 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 21:54:00,188 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=12.0 2024-08-10 21:54:02,519 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.00 vs. limit=15.0 2024-08-10 21:54:09,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.764e+01 3.105e+01 3.573e+01 5.750e+01, threshold=6.210e+01, percent-clipped=0.0 2024-08-10 21:54:15,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=765050.0, ans=0.0 2024-08-10 21:54:28,607 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 21:54:28,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=765150.0, ans=0.2 2024-08-10 21:54:29,650 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4050, loss[loss=0.118, beats_loss=0.01045, ecapa_loss=0.0002256, whisper_loss=0.1053, over 22168.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01153, ecapa_loss=0.0002256, whisper_loss=0.0959, over 3912424.52 frames. ], batch size: 89, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:54:36,217 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 21:54:36,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=765150.0, ans=10.0 2024-08-10 21:54:37,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=765150.0, ans=0.2 2024-08-10 21:54:45,100 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 21:54:57,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=765350.0, ans=0.0 2024-08-10 21:55:19,095 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 21:55:34,524 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4100, loss[loss=0.09418, beats_loss=0.01488, ecapa_loss=0.0001871, whisper_loss=0.07742, over 18657.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01159, ecapa_loss=0.0002247, whisper_loss=0.09488, over 3905069.39 frames. ], batch size: 75, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:55:35,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=765650.0, ans=0.125 2024-08-10 21:55:56,380 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.59 vs. limit=22.5 2024-08-10 21:56:03,804 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-10 21:56:15,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=765950.0, ans=0.125 2024-08-10 21:56:17,010 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 21:56:20,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.119e+01 2.758e+01 3.048e+01 3.457e+01 5.910e+01, threshold=6.096e+01, percent-clipped=0.0 2024-08-10 21:56:22,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765950.0, ans=0.1 2024-08-10 21:56:27,816 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.48 vs. limit=15.0 2024-08-10 21:56:36,735 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 12 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 21:56:40,714 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4150, loss[loss=0.1043, beats_loss=0.01109, ecapa_loss=0.0003059, whisper_loss=0.0901, over 15925.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01167, ecapa_loss=0.0002253, whisper_loss=0.09462, over 3906004.22 frames. ], batch size: 66, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:56:52,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=766250.0, ans=0.125 2024-08-10 21:57:08,200 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 21:57:10,679 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-10 21:57:15,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.45 vs. limit=5.0 2024-08-10 21:57:21,244 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 21:57:27,521 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 21:57:29,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=766450.0, ans=0.125 2024-08-10 21:57:46,012 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4200, loss[loss=0.1055, beats_loss=0.01266, ecapa_loss=0.0002198, whisper_loss=0.09069, over 20547.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0117, ecapa_loss=0.0002247, whisper_loss=0.09452, over 3939783.52 frames. ], batch size: 88, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:57:49,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=766650.0, ans=0.0 2024-08-10 21:57:55,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=766650.0, ans=0.125 2024-08-10 21:58:31,823 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.720e+01 3.062e+01 3.636e+01 5.115e+01, threshold=6.123e+01, percent-clipped=0.0 2024-08-10 21:58:34,633 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 21:58:42,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=767050.0, ans=0.125 2024-08-10 21:58:43,780 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 21:58:51,339 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4250, loss[loss=0.08164, beats_loss=0.01156, ecapa_loss=0.0002189, whisper_loss=0.06789, over 15749.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.0117, ecapa_loss=0.0002225, whisper_loss=0.09449, over 3928549.46 frames. ], batch size: 63, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:58:58,488 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 33 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 21:59:15,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=767250.0, ans=0.125 2024-08-10 21:59:23,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=767350.0, ans=0.0 2024-08-10 21:59:56,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=767650.0, ans=0.0 2024-08-10 21:59:56,661 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-10 21:59:57,286 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4300, loss[loss=0.06095, beats_loss=0.01712, ecapa_loss=0.0002643, whisper_loss=0.04118, over 13394.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01173, ecapa_loss=0.0002216, whisper_loss=0.09402, over 3911939.57 frames. ], batch size: 60, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:00:03,652 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2024-08-10 22:00:21,648 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 22:00:24,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=767850.0, ans=0.125 2024-08-10 22:00:24,519 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-08-10 22:00:28,101 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 22:00:28,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=767850.0, ans=0.95 2024-08-10 22:00:43,522 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.655e+01 2.968e+01 3.386e+01 7.323e+01, threshold=5.937e+01, percent-clipped=2.0 2024-08-10 22:00:53,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=768050.0, ans=0.0 2024-08-10 22:01:03,651 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4350, loss[loss=0.1007, beats_loss=0.01253, ecapa_loss=0.0002642, whisper_loss=0.08555, over 17224.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01176, ecapa_loss=0.0002211, whisper_loss=0.09326, over 3908548.39 frames. ], batch size: 68, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:01:30,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=768350.0, ans=0.125 2024-08-10 22:01:38,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=768350.0, ans=0.125 2024-08-10 22:01:52,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=768450.0, ans=0.125 2024-08-10 22:02:08,803 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4400, loss[loss=0.1225, beats_loss=0.01023, ecapa_loss=0.0002751, whisper_loss=0.1095, over 18835.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01169, ecapa_loss=0.0002231, whisper_loss=0.09393, over 3914001.37 frames. ], batch size: 76, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:02:12,479 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2024-08-10 22:02:14,600 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2024-08-10 22:02:16,731 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 22:02:31,478 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2024-08-10 22:02:44,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=768850.0, ans=0.125 2024-08-10 22:02:50,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=768950.0, ans=0.125 2024-08-10 22:02:55,237 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.875e+01 3.279e+01 3.849e+01 6.433e+01, threshold=6.559e+01, percent-clipped=3.0 2024-08-10 22:03:14,725 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4450, loss[loss=0.1224, beats_loss=0.01007, ecapa_loss=0.000242, whisper_loss=0.1099, over 20044.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01165, ecapa_loss=0.0002221, whisper_loss=0.094, over 3933140.01 frames. ], batch size: 79, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:03:38,054 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2024-08-10 22:03:49,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=769350.0, ans=0.125 2024-08-10 22:03:55,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=769450.0, ans=0.125 2024-08-10 22:03:59,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=769450.0, ans=0.125 2024-08-10 22:04:12,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=769550.0, ans=0.0 2024-08-10 22:04:13,508 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 22:04:20,299 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4500, loss[loss=0.1066, beats_loss=0.01212, ecapa_loss=0.0002642, whisper_loss=0.09183, over 15990.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01174, ecapa_loss=0.0002219, whisper_loss=0.09294, over 3930269.63 frames. ], batch size: 67, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:04:23,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=769650.0, ans=0.125 2024-08-10 22:04:28,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=769650.0, ans=0.2 2024-08-10 22:04:29,245 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 22:04:30,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=769650.0, ans=0.0 2024-08-10 22:04:36,858 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 22:04:46,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=769850.0, ans=0.0 2024-08-10 22:05:02,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=769950.0, ans=0.2 2024-08-10 22:05:05,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.643e+01 3.102e+01 3.659e+01 7.014e+01, threshold=6.204e+01, percent-clipped=1.0 2024-08-10 22:05:06,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=769950.0, ans=0.2 2024-08-10 22:05:12,075 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 22:05:22,715 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 22:05:24,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=770150.0, ans=0.0 2024-08-10 22:05:25,023 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4550, loss[loss=0.09194, beats_loss=0.01116, ecapa_loss=0.0002345, whisper_loss=0.07843, over 16407.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01171, ecapa_loss=0.0002214, whisper_loss=0.09288, over 3872318.34 frames. ], batch size: 69, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:05:29,863 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2024-08-10 22:05:41,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=770250.0, ans=0.125 2024-08-10 22:05:42,261 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 22:05:47,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=770250.0, ans=0.1 2024-08-10 22:05:48,648 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 37 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 22:06:11,881 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-10 22:06:17,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=770550.0, ans=0.125 2024-08-10 22:06:18,808 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 22:06:21,340 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-10 22:06:30,324 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4600, loss[loss=0.1261, beats_loss=0.01038, ecapa_loss=0.0002076, whisper_loss=0.1137, over 21957.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.0116, ecapa_loss=0.0002222, whisper_loss=0.09403, over 3886093.38 frames. ], batch size: 81, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:06:31,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=770650.0, ans=0.0 2024-08-10 22:06:33,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=770650.0, ans=0.0 2024-08-10 22:06:35,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-10 22:07:00,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=770850.0, ans=0.125 2024-08-10 22:07:00,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=770850.0, ans=0.1 2024-08-10 22:07:14,131 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 22:07:16,737 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.935e+01 3.290e+01 3.824e+01 6.429e+01, threshold=6.581e+01, percent-clipped=1.0 2024-08-10 22:07:28,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2024-08-10 22:07:34,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=771050.0, ans=0.125 2024-08-10 22:07:36,697 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4650, loss[loss=0.1241, beats_loss=0.009712, ecapa_loss=0.0002228, whisper_loss=0.1122, over 23100.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01155, ecapa_loss=0.0002237, whisper_loss=0.09428, over 3891097.59 frames. ], batch size: 87, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:07:43,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=771150.0, ans=0.125 2024-08-10 22:07:53,952 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 22:08:02,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=771350.0, ans=0.05 2024-08-10 22:08:05,861 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 29 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 22:08:10,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=771350.0, ans=0.125 2024-08-10 22:08:34,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=771550.0, ans=22.5 2024-08-10 22:08:39,451 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 22:08:41,933 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 22:08:43,021 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4700, loss[loss=0.09908, beats_loss=0.01212, ecapa_loss=0.0002205, whisper_loss=0.08475, over 22572.00 frames. ], tot_loss[loss=0.108, beats_loss=0.0116, ecapa_loss=0.0002234, whisper_loss=0.09415, over 3904036.85 frames. ], batch size: 91, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:08:49,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=771650.0, ans=0.125 2024-08-10 22:09:04,725 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 22:09:14,172 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 22:09:16,267 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2024-08-10 22:09:19,635 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 22:09:21,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=771850.0, ans=0.2 2024-08-10 22:09:29,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.713e+01 3.049e+01 3.532e+01 5.514e+01, threshold=6.097e+01, percent-clipped=0.0 2024-08-10 22:09:38,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=772050.0, ans=0.0 2024-08-10 22:09:43,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=772050.0, ans=0.125 2024-08-10 22:09:49,183 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4750, loss[loss=0.1194, beats_loss=0.007586, ecapa_loss=0.0002888, whisper_loss=0.1089, over 21911.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0116, ecapa_loss=0.0002236, whisper_loss=0.09427, over 3913658.35 frames. ], batch size: 90, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:09:56,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=772150.0, ans=0.0 2024-08-10 22:09:57,707 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 22:09:58,888 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 22:09:59,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=772150.0, ans=0.125 2024-08-10 22:10:04,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=772250.0, ans=0.0 2024-08-10 22:10:08,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=772250.0, ans=0.07 2024-08-10 22:10:16,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=772350.0, ans=0.125 2024-08-10 22:10:24,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=772350.0, ans=0.1 2024-08-10 22:10:27,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=772350.0, ans=0.2 2024-08-10 22:10:46,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=772550.0, ans=0.0 2024-08-10 22:10:55,390 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4800, loss[loss=0.07404, beats_loss=0.0137, ecapa_loss=0.0002909, whisper_loss=0.05743, over 14026.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01158, ecapa_loss=0.0002239, whisper_loss=0.09452, over 3921037.41 frames. ], batch size: 66, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:11:08,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=772750.0, ans=0.07 2024-08-10 22:11:23,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=772850.0, ans=0.125 2024-08-10 22:11:30,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=772850.0, ans=0.125 2024-08-10 22:11:41,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.673e+01 3.089e+01 3.492e+01 5.456e+01, threshold=6.177e+01, percent-clipped=0.0 2024-08-10 22:11:42,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=772950.0, ans=0.0 2024-08-10 22:11:58,621 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=12.0 2024-08-10 22:12:00,827 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4850, loss[loss=0.1107, beats_loss=0.01222, ecapa_loss=0.0002091, whisper_loss=0.09637, over 18427.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01162, ecapa_loss=0.0002238, whisper_loss=0.09461, over 3946318.22 frames. ], batch size: 73, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:12:19,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=773250.0, ans=0.0 2024-08-10 22:12:19,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773250.0, ans=0.1 2024-08-10 22:12:35,623 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 22:12:48,741 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-10 22:12:51,382 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 22:12:56,583 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 22:12:58,200 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-10 22:13:06,806 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4900, loss[loss=0.08303, beats_loss=0.01589, ecapa_loss=0.0001852, whisper_loss=0.06528, over 13353.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01158, ecapa_loss=0.0002246, whisper_loss=0.09461, over 3906070.58 frames. ], batch size: 54, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:13:09,591 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 22:13:19,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=773750.0, ans=0.125 2024-08-10 22:13:28,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=773750.0, ans=0.0 2024-08-10 22:13:53,183 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 2.882e+01 3.230e+01 4.059e+01 7.454e+01, threshold=6.460e+01, percent-clipped=3.0 2024-08-10 22:13:57,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=773950.0, ans=0.0 2024-08-10 22:14:03,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=774050.0, ans=0.2 2024-08-10 22:14:12,321 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 4950, loss[loss=0.1182, beats_loss=0.01005, ecapa_loss=0.0002787, whisper_loss=0.1053, over 14033.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01157, ecapa_loss=0.0002261, whisper_loss=0.09476, over 3884989.46 frames. ], batch size: 56, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:14:19,179 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 22:14:58,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774450.0, ans=0.1 2024-08-10 22:15:13,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=774550.0, ans=0.035 2024-08-10 22:15:17,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=774650.0, ans=0.04949747468305833 2024-08-10 22:15:18,638 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5000, loss[loss=0.1037, beats_loss=0.01437, ecapa_loss=0.0002116, whisper_loss=0.08718, over 14785.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01163, ecapa_loss=0.0002262, whisper_loss=0.09426, over 3860524.60 frames. ], batch size: 62, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:15:24,151 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 22:15:34,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=774750.0, ans=0.95 2024-08-10 22:15:44,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774750.0, ans=0.1 2024-08-10 22:15:53,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=774850.0, ans=0.2 2024-08-10 22:15:59,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=774950.0, ans=0.125 2024-08-10 22:16:07,424 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.646e+01 2.904e+01 3.171e+01 4.689e+01, threshold=5.808e+01, percent-clipped=0.0 2024-08-10 22:16:11,735 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 22:16:30,400 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5050, loss[loss=0.07861, beats_loss=0.01194, ecapa_loss=0.000203, whisper_loss=0.06464, over 19839.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01173, ecapa_loss=0.0002237, whisper_loss=0.0935, over 3846589.94 frames. ], batch size: 78, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:16:33,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=775150.0, ans=0.125 2024-08-10 22:16:49,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=775250.0, ans=0.125 2024-08-10 22:16:57,503 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 38 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 22:17:02,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=775350.0, ans=0.125 2024-08-10 22:17:13,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=775350.0, ans=0.2 2024-08-10 22:17:23,396 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 22:17:40,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=775550.0, ans=0.0 2024-08-10 22:17:47,107 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5100, loss[loss=0.09853, beats_loss=0.01166, ecapa_loss=0.0002489, whisper_loss=0.08439, over 21588.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01174, ecapa_loss=0.0002231, whisper_loss=0.09378, over 3850853.16 frames. ], batch size: 90, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:17:52,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=775650.0, ans=0.125 2024-08-10 22:17:53,329 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 22:18:34,118 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 22:18:43,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=775950.0, ans=0.0 2024-08-10 22:18:46,081 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 22:18:51,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.763e+01 3.180e+01 3.560e+01 6.035e+01, threshold=6.359e+01, percent-clipped=1.0 2024-08-10 22:18:53,293 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 22:18:56,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=775950.0, ans=0.0 2024-08-10 22:18:56,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=775950.0, ans=0.125 2024-08-10 22:18:58,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=776050.0, ans=0.07 2024-08-10 22:19:04,488 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-10 22:19:13,050 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 33 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 22:19:16,650 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5150, loss[loss=0.1097, beats_loss=0.01251, ecapa_loss=0.0002226, whisper_loss=0.09501, over 22146.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01173, ecapa_loss=0.0002226, whisper_loss=0.09426, over 3871106.62 frames. ], batch size: 89, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:19:17,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=776150.0, ans=0.125 2024-08-10 22:19:21,630 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 22:19:39,718 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 22:19:58,426 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2024-08-10 22:20:13,173 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 22:20:47,070 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 22:21:00,395 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 22:21:03,139 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5200, loss[loss=0.1095, beats_loss=0.01219, ecapa_loss=0.0002186, whisper_loss=0.09513, over 22233.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01169, ecapa_loss=0.0002212, whisper_loss=0.09432, over 3866440.75 frames. ], batch size: 92, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:21:17,492 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 15 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 22:21:29,703 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.79 vs. limit=22.5 2024-08-10 22:22:04,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.16 vs. limit=22.5 2024-08-10 22:22:11,476 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.788e+01 3.076e+01 3.692e+01 5.822e+01, threshold=6.152e+01, percent-clipped=0.0 2024-08-10 22:22:11,672 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 22:22:19,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=777050.0, ans=0.2 2024-08-10 22:22:34,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=777050.0, ans=0.125 2024-08-10 22:22:34,588 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2024-08-10 22:22:42,645 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5250, loss[loss=0.1039, beats_loss=0.01367, ecapa_loss=0.000215, whisper_loss=0.08811, over 21229.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01169, ecapa_loss=0.0002215, whisper_loss=0.094, over 3887679.14 frames. ], batch size: 88, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:23:01,628 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 22:23:08,572 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 22:23:11,883 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 22:23:36,446 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 22:23:38,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=777350.0, ans=0.125 2024-08-10 22:23:40,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=777350.0, ans=0.07 2024-08-10 22:23:49,506 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 30 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 22:24:38,052 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5300, loss[loss=0.08171, beats_loss=0.01426, ecapa_loss=0.0001831, whisper_loss=0.06562, over 16492.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.0116, ecapa_loss=0.0002218, whisper_loss=0.09459, over 3865224.18 frames. ], batch size: 67, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:24:56,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=777650.0, ans=0.125 2024-08-10 22:25:21,239 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-08-10 22:25:47,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=777850.0, ans=0.0 2024-08-10 22:26:02,636 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.845e+01 3.130e+01 3.652e+01 5.218e+01, threshold=6.259e+01, percent-clipped=0.0 2024-08-10 22:26:04,864 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.76 vs. limit=22.5 2024-08-10 22:26:37,458 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-10 22:26:38,970 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5350, loss[loss=0.1192, beats_loss=0.007798, ecapa_loss=0.0002386, whisper_loss=0.109, over 23689.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01161, ecapa_loss=0.0002199, whisper_loss=0.09463, over 3871906.23 frames. ], batch size: 93, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:26:39,170 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 22:27:03,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=778250.0, ans=0.0 2024-08-10 22:27:40,501 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 22:28:27,632 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5400, loss[loss=0.1118, beats_loss=0.00958, ecapa_loss=0.0002447, whisper_loss=0.09981, over 13337.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.0116, ecapa_loss=0.00022, whisper_loss=0.09515, over 3858720.78 frames. ], batch size: 55, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:28:39,663 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=12.0 2024-08-10 22:28:53,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778750.0, ans=0.1 2024-08-10 22:28:54,298 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2024-08-10 22:29:24,783 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.711e+01 3.100e+01 3.573e+01 5.377e+01, threshold=6.200e+01, percent-clipped=0.0 2024-08-10 22:29:32,678 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 28 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 22:29:38,313 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 22:29:49,236 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.95 vs. limit=22.5 2024-08-10 22:29:50,692 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5450, loss[loss=0.1009, beats_loss=0.01116, ecapa_loss=0.0002611, whisper_loss=0.08714, over 22474.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01153, ecapa_loss=0.0002209, whisper_loss=0.09508, over 3861194.73 frames. ], batch size: 94, lr: 1.05e-02, grad_scale: 8796093022208.0 2024-08-10 22:29:51,884 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.62 vs. limit=5.0 2024-08-10 22:30:00,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=779150.0, ans=0.125 2024-08-10 22:30:06,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=779150.0, ans=0.125 2024-08-10 22:30:18,703 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-10 22:30:28,291 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 22:30:42,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=779350.0, ans=0.1 2024-08-10 22:30:58,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=779450.0, ans=0.125 2024-08-10 22:30:58,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=779450.0, ans=0.125 2024-08-10 22:30:58,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.12 vs. limit=6.0 2024-08-10 22:30:59,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=779450.0, ans=0.0 2024-08-10 22:31:13,190 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=15.13 vs. limit=15.0 2024-08-10 22:31:23,647 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5500, loss[loss=0.112, beats_loss=0.009982, ecapa_loss=0.0002113, whisper_loss=0.09987, over 16619.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01151, ecapa_loss=0.0002208, whisper_loss=0.09563, over 3884778.99 frames. ], batch size: 62, lr: 1.05e-02, grad_scale: 8796093022208.0 2024-08-10 22:32:05,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=779850.0, ans=0.0 2024-08-10 22:32:12,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=779850.0, ans=0.0 2024-08-10 22:32:19,642 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 22:32:28,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=779950.0, ans=0.125 2024-08-10 22:32:29,648 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.640e+01 3.152e+01 3.892e+01 6.209e+01, threshold=6.304e+01, percent-clipped=1.0 2024-08-10 22:32:41,739 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 22:32:52,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=780050.0, ans=0.2 2024-08-10 22:32:58,523 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5550, loss[loss=0.1045, beats_loss=0.01112, ecapa_loss=0.0002325, whisper_loss=0.09107, over 20313.00 frames. ], tot_loss[loss=0.109, beats_loss=0.0115, ecapa_loss=0.0002225, whisper_loss=0.09523, over 3887383.68 frames. ], batch size: 85, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:33:25,266 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 22:33:33,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=780250.0, ans=0.05 2024-08-10 22:33:36,585 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.77 vs. limit=5.0 2024-08-10 22:33:39,563 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 22:33:39,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=780350.0, ans=0.04949747468305833 2024-08-10 22:34:09,202 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.25 vs. limit=10.0 2024-08-10 22:34:16,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780550.0, ans=0.1 2024-08-10 22:34:17,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=780550.0, ans=0.2 2024-08-10 22:34:32,520 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-10 22:34:33,146 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5600, loss[loss=0.1066, beats_loss=0.01344, ecapa_loss=0.0001992, whisper_loss=0.09115, over 20297.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01158, ecapa_loss=0.0002224, whisper_loss=0.09485, over 3915088.58 frames. ], batch size: 83, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:34:49,871 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 22:34:55,393 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 19 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 22:35:28,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2024-08-10 22:35:33,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=780950.0, ans=0.0 2024-08-10 22:35:36,145 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.826e+01 3.158e+01 3.731e+01 5.525e+01, threshold=6.316e+01, percent-clipped=0.0 2024-08-10 22:35:52,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=781050.0, ans=0.09899494936611666 2024-08-10 22:36:04,118 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5650, loss[loss=0.1234, beats_loss=0.009316, ecapa_loss=0.0002531, whisper_loss=0.1116, over 15905.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01161, ecapa_loss=0.0002215, whisper_loss=0.09465, over 3911071.32 frames. ], batch size: 61, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:37:19,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=781550.0, ans=0.125 2024-08-10 22:37:20,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=781550.0, ans=0.0 2024-08-10 22:37:33,207 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 22:37:33,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=781650.0, ans=0.2 2024-08-10 22:37:35,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5700, loss[loss=0.1039, beats_loss=0.01185, ecapa_loss=0.00022, whisper_loss=0.0898, over 21760.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01161, ecapa_loss=0.000221, whisper_loss=0.09527, over 3931822.77 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:37:38,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.50 vs. limit=6.0 2024-08-10 22:38:06,671 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 22:38:14,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=781850.0, ans=0.1 2024-08-10 22:38:40,016 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+01 2.917e+01 3.187e+01 3.836e+01 6.311e+01, threshold=6.373e+01, percent-clipped=0.0 2024-08-10 22:38:45,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=781950.0, ans=0.2 2024-08-10 22:38:50,020 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 33 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 22:38:54,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=782050.0, ans=0.125 2024-08-10 22:39:06,577 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5750, loss[loss=0.1045, beats_loss=0.01143, ecapa_loss=0.0002242, whisper_loss=0.09085, over 16834.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01164, ecapa_loss=0.0002219, whisper_loss=0.09529, over 3932542.13 frames. ], batch size: 68, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:39:17,509 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.54 vs. limit=10.0 2024-08-10 22:39:22,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=782150.0, ans=0.1 2024-08-10 22:39:53,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=782350.0, ans=0.125 2024-08-10 22:39:59,978 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.04 vs. limit=10.0 2024-08-10 22:40:00,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=782450.0, ans=0.0 2024-08-10 22:40:13,202 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2024-08-10 22:40:18,007 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 22:40:27,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=782550.0, ans=0.2 2024-08-10 22:40:39,538 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5800, loss[loss=0.1139, beats_loss=0.01041, ecapa_loss=0.0002202, whisper_loss=0.1013, over 20994.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01172, ecapa_loss=0.0002223, whisper_loss=0.09424, over 3906961.98 frames. ], batch size: 81, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:40:45,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=782650.0, ans=0.07 2024-08-10 22:40:53,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=782650.0, ans=0.125 2024-08-10 22:41:00,259 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-10 22:41:09,042 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 22:41:11,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=782750.0, ans=0.05 2024-08-10 22:41:12,265 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-10 22:41:40,282 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 22:41:44,318 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.672e+01 3.034e+01 3.531e+01 4.962e+01, threshold=6.068e+01, percent-clipped=0.0 2024-08-10 22:42:01,503 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2024-08-10 22:42:02,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=783050.0, ans=0.1 2024-08-10 22:42:12,223 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5850, loss[loss=0.103, beats_loss=0.01438, ecapa_loss=0.000227, whisper_loss=0.08638, over 18680.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01178, ecapa_loss=0.0002209, whisper_loss=0.09383, over 3920352.64 frames. ], batch size: 79, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:42:29,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=783250.0, ans=0.0 2024-08-10 22:42:42,527 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 27 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-10 22:42:47,606 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-10 22:43:05,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=783450.0, ans=0.125 2024-08-10 22:43:34,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=783550.0, ans=0.0 2024-08-10 22:43:38,147 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 22:43:41,770 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5900, loss[loss=0.0847, beats_loss=0.01411, ecapa_loss=0.000225, whisper_loss=0.06834, over 18429.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01172, ecapa_loss=0.0002213, whisper_loss=0.09364, over 3880458.32 frames. ], batch size: 78, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:43:47,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=783650.0, ans=0.1 2024-08-10 22:44:06,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=783750.0, ans=0.125 2024-08-10 22:44:10,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=783750.0, ans=0.0 2024-08-10 22:44:13,719 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 22:44:14,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=783750.0, ans=0.2 2024-08-10 22:44:40,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=783950.0, ans=0.0 2024-08-10 22:44:46,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=783950.0, ans=0.125 2024-08-10 22:44:47,402 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.739e+01 3.064e+01 3.610e+01 4.850e+01, threshold=6.128e+01, percent-clipped=0.0 2024-08-10 22:44:51,368 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 22:45:02,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=784050.0, ans=0.125 2024-08-10 22:45:06,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=784050.0, ans=0.125 2024-08-10 22:45:15,207 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 5950, loss[loss=0.1072, beats_loss=0.01043, ecapa_loss=0.0002264, whisper_loss=0.09451, over 21947.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01181, ecapa_loss=0.0002201, whisper_loss=0.09297, over 3894049.93 frames. ], batch size: 89, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:45:15,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=15.0 2024-08-10 22:45:26,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=784150.0, ans=0.1 2024-08-10 22:45:27,820 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 22:45:47,759 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 15 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 22:45:52,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=784350.0, ans=0.125 2024-08-10 22:45:58,705 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.22 vs. limit=15.0 2024-08-10 22:46:36,308 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 25 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 22:46:43,202 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=12.0 2024-08-10 22:46:46,112 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6000, loss[loss=0.09201, beats_loss=0.01133, ecapa_loss=0.0002249, whisper_loss=0.07843, over 17519.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01181, ecapa_loss=0.000221, whisper_loss=0.0934, over 3895893.83 frames. ], batch size: 69, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:46:46,112 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-10 22:47:25,512 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on ASR_libri: loss=0.2592, beats_loss=0, ecapa_loss=0.0006893, whisper_loss=0.2523, over 922467.00 frames. 2024-08-10 22:47:43,956 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on SV_voxceleb1: loss=0.005715, beats_loss=0, ecapa_loss=0.0005715, whisper_loss=0, over 939242.00 frames. 2024-08-10 22:49:35,333 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on AT_audioset: loss=0.02616, beats_loss=0.02616, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 22:49:35,337 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-10 22:49:45,553 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-08-10 22:50:18,921 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 36 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 22:50:20,844 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 22:50:21,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=784850.0, ans=0.2 2024-08-10 22:50:30,607 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 22:50:33,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.532e+01 2.986e+01 3.661e+01 5.128e+01, threshold=5.971e+01, percent-clipped=0.0 2024-08-10 22:51:00,036 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6050, loss[loss=0.09772, beats_loss=0.01019, ecapa_loss=0.0002514, whisper_loss=0.08502, over 20078.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01174, ecapa_loss=0.0002212, whisper_loss=0.09393, over 3891307.22 frames. ], batch size: 82, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:51:30,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=785250.0, ans=0.2 2024-08-10 22:51:42,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=785350.0, ans=0.0 2024-08-10 22:51:53,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=785350.0, ans=0.125 2024-08-10 22:52:04,008 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 22:52:18,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=785550.0, ans=0.125 2024-08-10 22:52:31,044 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-10 22:52:32,648 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 22:52:35,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6100, loss[loss=0.1101, beats_loss=0.01207, ecapa_loss=0.000223, whisper_loss=0.0958, over 14945.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01174, ecapa_loss=0.0002225, whisper_loss=0.09367, over 3876657.32 frames. ], batch size: 59, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:52:41,812 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-10 22:53:15,092 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 22:53:36,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.867e+01 3.222e+01 3.705e+01 5.709e+01, threshold=6.445e+01, percent-clipped=0.0 2024-08-10 22:53:44,951 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 22:53:58,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=786050.0, ans=0.2 2024-08-10 22:54:01,188 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2024-08-10 22:54:05,742 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6150, loss[loss=0.1212, beats_loss=0.01058, ecapa_loss=0.0002049, whisper_loss=0.1086, over 23632.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01176, ecapa_loss=0.0002227, whisper_loss=0.0941, over 3931268.60 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:54:24,763 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 22:54:41,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=786350.0, ans=0.1 2024-08-10 22:54:42,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=786350.0, ans=0.0 2024-08-10 22:54:46,905 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 22:54:47,549 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-10 22:54:48,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=786350.0, ans=0.125 2024-08-10 22:54:57,700 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 22:55:05,984 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-10 22:55:32,599 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6200, loss[loss=0.09406, beats_loss=0.01343, ecapa_loss=0.000184, whisper_loss=0.07879, over 20521.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01176, ecapa_loss=0.0002218, whisper_loss=0.09397, over 3929043.85 frames. ], batch size: 81, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:55:49,418 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.41 vs. limit=10.0 2024-08-10 22:55:55,716 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2024-08-10 22:56:00,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=786750.0, ans=0.0 2024-08-10 22:56:01,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=786750.0, ans=0.0 2024-08-10 22:56:21,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=786950.0, ans=0.2 2024-08-10 22:56:31,727 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.713e+01 3.021e+01 3.323e+01 5.362e+01, threshold=6.041e+01, percent-clipped=0.0 2024-08-10 22:56:43,114 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-10 22:56:57,146 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6250, loss[loss=0.1248, beats_loss=0.009974, ecapa_loss=0.0003156, whisper_loss=0.1117, over 22007.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01171, ecapa_loss=0.0002211, whisper_loss=0.09453, over 3936047.53 frames. ], batch size: 93, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:57:15,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=787250.0, ans=0.2 2024-08-10 22:57:29,324 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 14 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 22:57:44,848 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 22:58:20,969 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6300, loss[loss=0.1309, beats_loss=0.008992, ecapa_loss=0.0003248, whisper_loss=0.1187, over 13241.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0117, ecapa_loss=0.0002228, whisper_loss=0.09368, over 3907199.24 frames. ], batch size: 53, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:58:26,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=787650.0, ans=0.125 2024-08-10 22:58:32,149 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-10 22:58:34,969 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 22:59:02,860 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 22:59:10,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=787950.0, ans=0.0 2024-08-10 22:59:12,056 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 22:59:14,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=787950.0, ans=0.0 2024-08-10 22:59:15,518 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 27 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-10 22:59:17,575 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2024-08-10 22:59:19,942 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 2.845e+01 3.065e+01 3.583e+01 5.394e+01, threshold=6.129e+01, percent-clipped=0.0 2024-08-10 22:59:24,451 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=22.5 2024-08-10 22:59:40,214 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 22:59:43,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=788150.0, ans=0.125 2024-08-10 22:59:44,785 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6350, loss[loss=0.1089, beats_loss=0.01116, ecapa_loss=0.000217, whisper_loss=0.09553, over 18865.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01172, ecapa_loss=0.0002224, whisper_loss=0.09375, over 3855572.49 frames. ], batch size: 76, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:59:45,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=788150.0, ans=0.1 2024-08-10 22:59:47,983 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 23:00:01,756 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=22.5 2024-08-10 23:00:03,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=788250.0, ans=0.125 2024-08-10 23:00:05,195 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=12.0 2024-08-10 23:00:18,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=788350.0, ans=0.125 2024-08-10 23:00:33,995 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2024-08-10 23:00:42,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=788450.0, ans=0.0 2024-08-10 23:01:09,153 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6400, loss[loss=0.0776, beats_loss=0.01166, ecapa_loss=0.000289, whisper_loss=0.06305, over 12993.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0117, ecapa_loss=0.0002217, whisper_loss=0.09388, over 3861407.38 frames. ], batch size: 57, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:01:20,423 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 23:02:07,219 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.821e+01 3.135e+01 3.560e+01 4.755e+01, threshold=6.269e+01, percent-clipped=0.0 2024-08-10 23:02:07,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=788950.0, ans=0.125 2024-08-10 23:02:15,991 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 23:02:32,281 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6450, loss[loss=0.1195, beats_loss=0.01017, ecapa_loss=0.0002151, whisper_loss=0.1072, over 14825.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01182, ecapa_loss=0.0002208, whisper_loss=0.09328, over 3879023.08 frames. ], batch size: 57, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:02:45,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=789150.0, ans=0.2 2024-08-10 23:02:54,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.10 vs. limit=10.0 2024-08-10 23:03:00,076 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 23:03:24,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=789450.0, ans=0.1 2024-08-10 23:03:39,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=789550.0, ans=0.2 2024-08-10 23:03:54,214 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6500, loss[loss=0.1051, beats_loss=0.01122, ecapa_loss=0.0002426, whisper_loss=0.09147, over 19069.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0118, ecapa_loss=0.0002199, whisper_loss=0.09405, over 3861766.73 frames. ], batch size: 78, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:04:16,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=789750.0, ans=0.1 2024-08-10 23:04:26,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=789750.0, ans=0.07 2024-08-10 23:04:43,593 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 23:04:53,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=789950.0, ans=22.5 2024-08-10 23:04:54,769 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2024-08-10 23:04:55,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 2.899e+01 3.223e+01 3.887e+01 5.763e+01, threshold=6.447e+01, percent-clipped=0.0 2024-08-10 23:04:56,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=789950.0, ans=0.125 2024-08-10 23:05:01,460 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 23:05:19,113 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6550, loss[loss=0.09332, beats_loss=0.0131, ecapa_loss=0.0002498, whisper_loss=0.07772, over 14741.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01178, ecapa_loss=0.0002188, whisper_loss=0.09416, over 3861675.09 frames. ], batch size: 60, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:05:23,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=790150.0, ans=0.125 2024-08-10 23:05:30,484 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2024-08-10 23:05:51,546 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.06 vs. limit=15.0 2024-08-10 23:06:01,903 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 23:06:02,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=790350.0, ans=0.125 2024-08-10 23:06:07,315 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 23:06:21,273 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.30 vs. limit=6.0 2024-08-10 23:06:42,272 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6600, loss[loss=0.1196, beats_loss=0.01079, ecapa_loss=0.0002443, whisper_loss=0.1064, over 22653.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01173, ecapa_loss=0.000219, whisper_loss=0.09514, over 3910359.24 frames. ], batch size: 93, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:06:57,722 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 20 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 23:06:59,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=790750.0, ans=0.125 2024-08-10 23:07:04,896 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.05 vs. limit=10.0 2024-08-10 23:07:29,052 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 23:07:38,335 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.957e+01 3.211e+01 3.827e+01 6.878e+01, threshold=6.422e+01, percent-clipped=2.0 2024-08-10 23:07:58,077 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 41 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 23:08:02,360 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6650, loss[loss=0.1303, beats_loss=0.01002, ecapa_loss=0.0002518, whisper_loss=0.1178, over 21823.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01168, ecapa_loss=0.0002207, whisper_loss=0.09511, over 3934115.88 frames. ], batch size: 88, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:08:30,954 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2024-08-10 23:08:36,522 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 23:09:09,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=791550.0, ans=0.1 2024-08-10 23:09:17,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=791550.0, ans=0.0 2024-08-10 23:09:21,047 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.74 vs. limit=22.5 2024-08-10 23:09:22,951 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6700, loss[loss=0.08033, beats_loss=0.01561, ecapa_loss=0.0002279, whisper_loss=0.06244, over 21846.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01172, ecapa_loss=0.0002201, whisper_loss=0.09463, over 3935351.04 frames. ], batch size: 92, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:09:24,894 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.60 vs. limit=22.5 2024-08-10 23:09:49,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=791750.0, ans=0.125 2024-08-10 23:09:53,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=791850.0, ans=0.0 2024-08-10 23:10:15,177 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.716e+01 3.104e+01 3.606e+01 5.024e+01, threshold=6.207e+01, percent-clipped=0.0 2024-08-10 23:10:29,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=792050.0, ans=0.0 2024-08-10 23:10:37,756 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6750, loss[loss=0.1068, beats_loss=0.01165, ecapa_loss=0.0002172, whisper_loss=0.09301, over 21696.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01172, ecapa_loss=0.0002189, whisper_loss=0.09478, over 3921697.36 frames. ], batch size: 89, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:10:44,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=792150.0, ans=0.0 2024-08-10 23:10:47,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=792150.0, ans=0.0 2024-08-10 23:10:51,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=792250.0, ans=0.125 2024-08-10 23:11:08,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=792350.0, ans=0.125 2024-08-10 23:11:11,505 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-08-10 23:11:28,916 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2024-08-10 23:11:54,386 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6800, loss[loss=0.1071, beats_loss=0.01128, ecapa_loss=0.0002426, whisper_loss=0.09336, over 17580.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01168, ecapa_loss=0.0002203, whisper_loss=0.09495, over 3918409.55 frames. ], batch size: 72, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:12:07,739 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-10 23:12:10,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=792750.0, ans=0.0 2024-08-10 23:12:16,328 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.97 vs. limit=22.5 2024-08-10 23:12:35,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=792850.0, ans=0.125 2024-08-10 23:12:37,960 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-10 23:12:39,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=792950.0, ans=0.125 2024-08-10 23:12:46,819 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 2.823e+01 3.224e+01 3.746e+01 6.225e+01, threshold=6.449e+01, percent-clipped=1.0 2024-08-10 23:12:52,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=792950.0, ans=0.0 2024-08-10 23:12:53,540 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 23:13:01,096 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:13:09,954 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2024-08-10 23:13:10,772 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6850, loss[loss=0.1003, beats_loss=0.01576, ecapa_loss=0.0001959, whisper_loss=0.08256, over 21118.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01159, ecapa_loss=0.000222, whisper_loss=0.09489, over 3893866.33 frames. ], batch size: 87, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:13:20,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2024-08-10 23:13:21,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=793150.0, ans=0.125 2024-08-10 23:13:27,612 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=12.0 2024-08-10 23:13:29,121 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=15.0 2024-08-10 23:13:29,648 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 23:13:54,198 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 23:13:58,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=793450.0, ans=0.125 2024-08-10 23:14:06,112 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-10 23:14:17,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=793550.0, ans=0.0 2024-08-10 23:14:20,257 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-10 23:14:28,229 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6900, loss[loss=0.07623, beats_loss=0.01357, ecapa_loss=0.0002007, whisper_loss=0.06065, over 18631.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01156, ecapa_loss=0.000222, whisper_loss=0.09479, over 3882849.11 frames. ], batch size: 76, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:14:35,010 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2024-08-10 23:15:13,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=793950.0, ans=0.125 2024-08-10 23:15:20,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.839e+01 3.157e+01 3.612e+01 7.302e+01, threshold=6.314e+01, percent-clipped=1.0 2024-08-10 23:15:22,125 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-10 23:15:23,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=793950.0, ans=0.125 2024-08-10 23:15:26,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=794050.0, ans=0.125 2024-08-10 23:15:41,024 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 23:15:42,734 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 6950, loss[loss=0.09858, beats_loss=0.01217, ecapa_loss=0.0002019, whisper_loss=0.08439, over 15731.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01158, ecapa_loss=0.0002219, whisper_loss=0.09469, over 3862045.34 frames. ], batch size: 62, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:15:46,938 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 23:15:47,291 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.077e+00 2024-08-10 23:15:53,717 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 23:15:54,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2024-08-10 23:16:05,454 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2024-08-10 23:16:14,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=794350.0, ans=0.125 2024-08-10 23:16:15,612 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-10 23:16:33,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=794450.0, ans=0.0 2024-08-10 23:16:39,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=794450.0, ans=0.125 2024-08-10 23:16:47,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=794550.0, ans=0.2 2024-08-10 23:16:49,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=794550.0, ans=0.125 2024-08-10 23:16:56,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794650.0, ans=0.1 2024-08-10 23:16:57,099 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7000, loss[loss=0.1194, beats_loss=0.009236, ecapa_loss=0.0002946, whisper_loss=0.1073, over 21813.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01159, ecapa_loss=0.0002219, whisper_loss=0.09486, over 3883752.97 frames. ], batch size: 93, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:17:05,186 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 33 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 23:17:16,937 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 23:17:19,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=794750.0, ans=0.0 2024-08-10 23:17:24,312 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 23:17:32,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=794850.0, ans=0.0 2024-08-10 23:17:49,343 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.681e+01 2.983e+01 3.369e+01 6.385e+01, threshold=5.967e+01, percent-clipped=1.0 2024-08-10 23:17:54,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794950.0, ans=0.1 2024-08-10 23:18:06,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=795050.0, ans=0.5 2024-08-10 23:18:06,923 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.945e-01 2024-08-10 23:18:12,524 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7050, loss[loss=0.08328, beats_loss=0.01381, ecapa_loss=0.0002282, whisper_loss=0.06719, over 20631.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01163, ecapa_loss=0.0002225, whisper_loss=0.09456, over 3886921.10 frames. ], batch size: 88, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:18:34,515 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.470e+05 2024-08-10 23:18:48,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=795350.0, ans=0.125 2024-08-10 23:19:05,353 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 15 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 23:19:09,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=795450.0, ans=0.125 2024-08-10 23:19:28,828 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7100, loss[loss=0.08404, beats_loss=0.01397, ecapa_loss=0.0001806, whisper_loss=0.06826, over 18816.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01168, ecapa_loss=0.0002199, whisper_loss=0.09454, over 3877760.82 frames. ], batch size: 76, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:19:37,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=795650.0, ans=0.125 2024-08-10 23:19:51,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2024-08-10 23:19:54,586 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 23:19:57,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=795850.0, ans=0.125 2024-08-10 23:20:10,797 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 17 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 23:20:15,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=795950.0, ans=0.1 2024-08-10 23:20:22,925 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.045e+01 2.587e+01 2.924e+01 3.368e+01 5.025e+01, threshold=5.848e+01, percent-clipped=0.0 2024-08-10 23:20:31,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=796050.0, ans=0.0 2024-08-10 23:20:46,497 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7150, loss[loss=0.1158, beats_loss=0.0109, ecapa_loss=0.0001996, whisper_loss=0.1029, over 16233.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01165, ecapa_loss=0.0002192, whisper_loss=0.09485, over 3871333.80 frames. ], batch size: 59, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:20:49,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=796150.0, ans=0.2 2024-08-10 23:20:51,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=796150.0, ans=0.125 2024-08-10 23:20:58,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=796150.0, ans=0.125 2024-08-10 23:21:25,836 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 23:21:28,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=796350.0, ans=0.125 2024-08-10 23:21:44,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.30 vs. limit=22.5 2024-08-10 23:21:53,658 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=12.0 2024-08-10 23:22:00,081 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7200, loss[loss=0.1081, beats_loss=0.009032, ecapa_loss=0.0002529, whisper_loss=0.0965, over 17714.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.0116, ecapa_loss=0.0002201, whisper_loss=0.09556, over 3907781.19 frames. ], batch size: 73, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:22:30,089 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:22:48,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=796950.0, ans=0.125 2024-08-10 23:22:50,751 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-10 23:22:54,630 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.940e+01 3.360e+01 3.850e+01 6.660e+01, threshold=6.719e+01, percent-clipped=3.0 2024-08-10 23:22:56,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=796950.0, ans=0.035 2024-08-10 23:22:58,079 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 23:23:06,529 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-08-10 23:23:15,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=797050.0, ans=0.2 2024-08-10 23:23:16,666 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:23:17,505 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7250, loss[loss=0.1249, beats_loss=0.009391, ecapa_loss=0.0002771, whisper_loss=0.1128, over 14793.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01158, ecapa_loss=0.0002215, whisper_loss=0.09534, over 3868353.10 frames. ], batch size: 61, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:23:22,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=797150.0, ans=0.0 2024-08-10 23:23:28,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=797150.0, ans=0.125 2024-08-10 23:23:30,682 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=29.11 vs. limit=15.0 2024-08-10 23:23:31,997 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-08-10 23:23:38,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=797250.0, ans=0.0 2024-08-10 23:23:47,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=797350.0, ans=0.2 2024-08-10 23:23:58,249 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 23:24:02,650 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 23:24:10,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=797450.0, ans=0.1 2024-08-10 23:24:13,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=797450.0, ans=0.125 2024-08-10 23:24:29,941 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 23:24:31,517 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7300, loss[loss=0.09675, beats_loss=0.01148, ecapa_loss=0.0002633, whisper_loss=0.08263, over 17614.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01154, ecapa_loss=0.000222, whisper_loss=0.09474, over 3882710.57 frames. ], batch size: 74, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:24:31,705 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-10 23:24:56,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=797750.0, ans=0.125 2024-08-10 23:24:57,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=797750.0, ans=0.2 2024-08-10 23:25:09,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=797850.0, ans=0.125 2024-08-10 23:25:14,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=797950.0, ans=0.125 2024-08-10 23:25:17,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=797950.0, ans=0.125 2024-08-10 23:25:22,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.736e+01 3.135e+01 3.639e+01 8.330e+01, threshold=6.270e+01, percent-clipped=2.0 2024-08-10 23:25:30,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.76 vs. limit=15.0 2024-08-10 23:25:34,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=798050.0, ans=0.125 2024-08-10 23:25:38,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=798050.0, ans=0.07 2024-08-10 23:25:43,747 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7350, loss[loss=0.09484, beats_loss=0.01469, ecapa_loss=0.0002033, whisper_loss=0.07811, over 22540.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.0115, ecapa_loss=0.0002236, whisper_loss=0.09533, over 3893058.23 frames. ], batch size: 95, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:25:44,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=798150.0, ans=0.125 2024-08-10 23:25:55,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=798150.0, ans=0.125 2024-08-10 23:26:09,037 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=6.946e-02 2024-08-10 23:26:11,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798350.0, ans=0.1 2024-08-10 23:26:17,477 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 23:26:36,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=798450.0, ans=0.04949747468305833 2024-08-10 23:26:42,809 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 19 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 23:26:44,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=798550.0, ans=0.2 2024-08-10 23:26:47,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=798550.0, ans=0.1 2024-08-10 23:26:49,183 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=15.0 2024-08-10 23:26:54,027 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7400, loss[loss=0.1002, beats_loss=0.01345, ecapa_loss=0.0001846, whisper_loss=0.08489, over 16861.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01157, ecapa_loss=0.0002219, whisper_loss=0.09454, over 3848225.97 frames. ], batch size: 65, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:27:02,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=798650.0, ans=6.0 2024-08-10 23:27:04,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=798650.0, ans=0.2 2024-08-10 23:27:04,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798650.0, ans=0.1 2024-08-10 23:27:12,865 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.96 vs. limit=22.5 2024-08-10 23:27:26,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=798850.0, ans=0.0 2024-08-10 23:27:28,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=798850.0, ans=0.125 2024-08-10 23:27:42,880 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.647e+01 3.053e+01 3.534e+01 7.826e+01, threshold=6.106e+01, percent-clipped=2.0 2024-08-10 23:27:45,622 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 21 from LS+wenet, 29 from Vox, 46 fro AS 2024-08-10 23:27:46,343 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-10 23:27:53,532 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 24 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-10 23:28:01,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=799050.0, ans=0.125 2024-08-10 23:28:04,088 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7450, loss[loss=0.08707, beats_loss=0.01577, ecapa_loss=0.000181, whisper_loss=0.0695, over 19271.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01157, ecapa_loss=0.0002221, whisper_loss=0.09453, over 3869660.98 frames. ], batch size: 79, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:28:04,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=799150.0, ans=0.125 2024-08-10 23:28:05,679 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-10 23:28:15,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=799150.0, ans=0.125 2024-08-10 23:28:19,034 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 23:28:45,618 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 23:28:45,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=799450.0, ans=0.125 2024-08-10 23:28:59,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=799550.0, ans=0.0 2024-08-10 23:29:06,471 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 23 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-10 23:29:06,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=799550.0, ans=0.1 2024-08-10 23:29:12,738 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7500, loss[loss=0.1259, beats_loss=0.009829, ecapa_loss=0.0002252, whisper_loss=0.1138, over 23004.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01155, ecapa_loss=0.0002221, whisper_loss=0.09475, over 3868303.21 frames. ], batch size: 90, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:29:14,129 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 23:29:19,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=799650.0, ans=0.05 2024-08-10 23:29:29,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=799750.0, ans=0.1 2024-08-10 23:29:31,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=799750.0, ans=0.95 2024-08-10 23:29:45,775 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-08-10 23:29:49,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=799850.0, ans=0.1 2024-08-10 23:29:54,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=799950.0, ans=0.0 2024-08-10 23:29:57,883 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 23:30:04,458 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.160e+01 2.761e+01 3.186e+01 3.767e+01 5.987e+01, threshold=6.373e+01, percent-clipped=0.0 2024-08-10 23:30:17,740 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.85 vs. limit=6.0 2024-08-10 23:30:25,968 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7550, loss[loss=0.1184, beats_loss=0.0125, ecapa_loss=0.0001684, whisper_loss=0.1042, over 17323.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01153, ecapa_loss=0.0002216, whisper_loss=0.09465, over 3833903.53 frames. ], batch size: 64, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:31:12,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=800450.0, ans=0.125 2024-08-10 23:31:22,604 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.48 vs. limit=22.5 2024-08-10 23:31:31,342 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 28 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-10 23:31:31,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=800550.0, ans=0.0 2024-08-10 23:31:34,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=800550.0, ans=0.05 2024-08-10 23:31:38,849 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7600, loss[loss=0.09489, beats_loss=0.01173, ecapa_loss=0.0002236, whisper_loss=0.08093, over 21604.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01159, ecapa_loss=0.0002208, whisper_loss=0.09425, over 3821737.30 frames. ], batch size: 90, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:31:39,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=12.0 2024-08-10 23:31:40,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=800650.0, ans=0.125 2024-08-10 23:31:42,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=800650.0, ans=0.125 2024-08-10 23:31:47,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=800650.0, ans=0.0 2024-08-10 23:31:50,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800650.0, ans=0.1 2024-08-10 23:32:00,422 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 16 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 23:32:05,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800750.0, ans=0.1 2024-08-10 23:32:20,646 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.07 vs. limit=15.0 2024-08-10 23:32:30,181 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.785e+01 3.066e+01 3.767e+01 8.128e+01, threshold=6.132e+01, percent-clipped=1.0 2024-08-10 23:32:39,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801050.0, ans=0.1 2024-08-10 23:32:50,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=801150.0, ans=0.07 2024-08-10 23:32:51,449 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7650, loss[loss=0.1427, beats_loss=0.007697, ecapa_loss=0.0001983, whisper_loss=0.133, over 24674.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.0116, ecapa_loss=0.0002207, whisper_loss=0.09413, over 3849910.29 frames. ], batch size: 89, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:33:01,236 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 23:33:22,042 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 23:33:24,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=801350.0, ans=0.125 2024-08-10 23:33:27,048 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 19 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-10 23:33:30,950 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2024-08-10 23:33:34,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=801450.0, ans=0.125 2024-08-10 23:33:44,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=801450.0, ans=0.125 2024-08-10 23:33:56,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801550.0, ans=0.1 2024-08-10 23:34:01,603 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7700, loss[loss=0.109, beats_loss=0.01289, ecapa_loss=0.0001925, whisper_loss=0.09414, over 16620.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01157, ecapa_loss=0.0002213, whisper_loss=0.09409, over 3866003.97 frames. ], batch size: 61, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:34:03,166 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 33 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 23:34:27,548 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 23:34:34,552 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 30 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 23:34:37,091 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 23:34:40,935 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 23:34:50,528 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.824e+01 3.342e+01 3.789e+01 5.468e+01, threshold=6.684e+01, percent-clipped=0.0 2024-08-10 23:34:55,260 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 23:34:59,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=802050.0, ans=0.1 2024-08-10 23:35:08,405 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 23:35:11,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7750, loss[loss=0.09743, beats_loss=0.01049, ecapa_loss=0.0002038, whisper_loss=0.0849, over 17915.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01158, ecapa_loss=0.0002219, whisper_loss=0.09409, over 3893640.61 frames. ], batch size: 69, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:35:14,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=802150.0, ans=0.125 2024-08-10 23:35:14,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=802150.0, ans=0.04949747468305833 2024-08-10 23:35:19,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=802150.0, ans=15.0 2024-08-10 23:35:21,471 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 23:35:31,914 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-10 23:36:00,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=802450.0, ans=0.0 2024-08-10 23:36:04,355 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 23:36:09,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=802550.0, ans=0.125 2024-08-10 23:36:18,997 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 23:36:22,522 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 23:36:25,062 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7800, loss[loss=0.1318, beats_loss=0.009237, ecapa_loss=0.0002485, whisper_loss=0.12, over 15269.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01161, ecapa_loss=0.0002199, whisper_loss=0.09378, over 3855089.69 frames. ], batch size: 63, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:36:47,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=802750.0, ans=0.2 2024-08-10 23:37:14,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.895e+01 3.316e+01 3.988e+01 7.505e+01, threshold=6.631e+01, percent-clipped=2.0 2024-08-10 23:37:20,427 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.701e-03 2024-08-10 23:37:35,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=803150.0, ans=0.0 2024-08-10 23:37:35,937 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7850, loss[loss=0.1186, beats_loss=0.01336, ecapa_loss=0.000183, whisper_loss=0.1034, over 19583.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01165, ecapa_loss=0.0002215, whisper_loss=0.0933, over 3851421.10 frames. ], batch size: 76, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:37:40,099 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 23:37:48,670 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 23:38:01,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=803250.0, ans=0.125 2024-08-10 23:38:02,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=803350.0, ans=0.2 2024-08-10 23:38:05,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=803350.0, ans=0.125 2024-08-10 23:38:20,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=803450.0, ans=0.125 2024-08-10 23:38:28,552 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 23:38:31,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=803550.0, ans=0.2 2024-08-10 23:38:35,471 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-10 23:38:47,402 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7900, loss[loss=0.137, beats_loss=0.01022, ecapa_loss=0.0002495, whisper_loss=0.1243, over 22427.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01179, ecapa_loss=0.0002188, whisper_loss=0.09345, over 3866325.47 frames. ], batch size: 91, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:38:51,768 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 23:38:53,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803650.0, ans=0.1 2024-08-10 23:39:07,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=803750.0, ans=0.0 2024-08-10 23:39:16,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=803850.0, ans=0.0 2024-08-10 23:39:37,616 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.898e+01 3.197e+01 3.826e+01 5.899e+01, threshold=6.393e+01, percent-clipped=0.0 2024-08-10 23:39:45,271 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 23:39:48,433 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2024-08-10 23:39:48,784 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.29 vs. limit=15.0 2024-08-10 23:39:58,260 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 7950, loss[loss=0.1251, beats_loss=0.009898, ecapa_loss=0.0002149, whisper_loss=0.1131, over 17107.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01177, ecapa_loss=0.0002193, whisper_loss=0.09422, over 3885096.56 frames. ], batch size: 64, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:39:58,804 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.122e-02 2024-08-10 23:40:28,469 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 23:40:40,976 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2024-08-10 23:40:41,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=804450.0, ans=0.2 2024-08-10 23:40:43,058 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 23:40:49,637 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 23:40:56,325 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 23:41:01,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=804550.0, ans=0.0 2024-08-10 23:41:05,846 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8000, loss[loss=0.12, beats_loss=0.009287, ecapa_loss=0.0002407, whisper_loss=0.1084, over 19094.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01169, ecapa_loss=0.0002181, whisper_loss=0.09412, over 3859840.98 frames. ], batch size: 77, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:41:11,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=804650.0, ans=0.0 2024-08-10 23:41:28,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=804750.0, ans=0.1 2024-08-10 23:41:51,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.694e+01 3.160e+01 3.631e+01 6.005e+01, threshold=6.321e+01, percent-clipped=0.0 2024-08-10 23:41:58,244 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2024-08-10 23:42:11,671 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8050, loss[loss=0.08978, beats_loss=0.01406, ecapa_loss=0.0002415, whisper_loss=0.0733, over 21363.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01166, ecapa_loss=0.0002174, whisper_loss=0.09379, over 3834134.55 frames. ], batch size: 93, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:42:12,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=805150.0, ans=0.07 2024-08-10 23:42:15,674 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 23:42:49,349 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-10 23:43:18,221 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8100, loss[loss=0.1066, beats_loss=0.01213, ecapa_loss=0.0001465, whisper_loss=0.09305, over 15692.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01161, ecapa_loss=0.0002177, whisper_loss=0.09402, over 3839824.69 frames. ], batch size: 58, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:43:24,772 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 23:44:04,011 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.753e+01 3.174e+01 3.635e+01 5.123e+01, threshold=6.349e+01, percent-clipped=0.0 2024-08-10 23:44:04,293 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 23:44:11,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=806050.0, ans=0.125 2024-08-10 23:44:14,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=806050.0, ans=0.125 2024-08-10 23:44:16,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=806050.0, ans=0.1 2024-08-10 23:44:24,398 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8150, loss[loss=0.08872, beats_loss=0.01153, ecapa_loss=0.0002587, whisper_loss=0.07461, over 21911.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.0116, ecapa_loss=0.0002184, whisper_loss=0.09411, over 3839925.00 frames. ], batch size: 95, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:44:36,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=806250.0, ans=0.5 2024-08-10 23:44:49,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=806250.0, ans=0.1 2024-08-10 23:44:51,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=806350.0, ans=0.04949747468305833 2024-08-10 23:45:02,818 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 23:45:13,560 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 23:45:21,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=806550.0, ans=0.125 2024-08-10 23:45:30,516 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8200, loss[loss=0.1052, beats_loss=0.0145, ecapa_loss=0.0002015, whisper_loss=0.08873, over 21099.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01165, ecapa_loss=0.0002193, whisper_loss=0.09353, over 3889882.40 frames. ], batch size: 87, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:45:32,008 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 31 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 23:45:41,464 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.47 vs. limit=6.0 2024-08-10 23:45:42,161 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 26 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 23:45:43,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=806750.0, ans=0.125 2024-08-10 23:45:57,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=806850.0, ans=0.0 2024-08-10 23:45:58,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=806850.0, ans=0.5 2024-08-10 23:46:06,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=806850.0, ans=0.1 2024-08-10 23:46:06,764 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.40 vs. limit=6.0 2024-08-10 23:46:16,297 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.784e+01 3.124e+01 3.627e+01 5.044e+01, threshold=6.248e+01, percent-clipped=0.0 2024-08-10 23:46:16,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=806950.0, ans=0.125 2024-08-10 23:46:19,120 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 23:46:22,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=807050.0, ans=0.125 2024-08-10 23:46:36,066 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8250, loss[loss=0.112, beats_loss=0.01371, ecapa_loss=0.000209, whisper_loss=0.09621, over 19544.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01167, ecapa_loss=0.0002189, whisper_loss=0.09359, over 3883021.75 frames. ], batch size: 79, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:46:40,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=807150.0, ans=0.125 2024-08-10 23:46:41,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=15.0 2024-08-10 23:46:53,498 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 23:47:15,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=807450.0, ans=0.05 2024-08-10 23:47:18,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=807450.0, ans=0.2 2024-08-10 23:47:41,349 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 23:47:42,452 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8300, loss[loss=0.1038, beats_loss=0.01281, ecapa_loss=0.0002851, whisper_loss=0.0881, over 16946.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01177, ecapa_loss=0.0002189, whisper_loss=0.09287, over 3869114.05 frames. ], batch size: 72, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:47:46,015 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.01 vs. limit=15.0 2024-08-10 23:48:01,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=807750.0, ans=0.0 2024-08-10 23:48:07,428 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2024-08-10 23:48:10,854 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 15 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 23:48:12,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=807850.0, ans=0.125 2024-08-10 23:48:16,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=807850.0, ans=0.0 2024-08-10 23:48:22,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807950.0, ans=0.1 2024-08-10 23:48:28,093 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 23:48:28,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=807950.0, ans=0.125 2024-08-10 23:48:29,181 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.802e+01 3.186e+01 3.664e+01 3.254e+02, threshold=6.372e+01, percent-clipped=4.0 2024-08-10 23:48:37,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=808050.0, ans=0.0 2024-08-10 23:48:41,282 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2024-08-10 23:48:43,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=808050.0, ans=0.125 2024-08-10 23:48:48,742 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8350, loss[loss=0.1214, beats_loss=0.01038, ecapa_loss=0.0002556, whisper_loss=0.1085, over 21714.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01166, ecapa_loss=0.0002206, whisper_loss=0.09347, over 3881296.29 frames. ], batch size: 89, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:48:50,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=808150.0, ans=0.125 2024-08-10 23:48:52,476 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 15 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 23:48:55,022 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 23:48:58,612 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 23:48:59,918 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 23:49:25,107 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 23:49:37,600 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.19 vs. limit=15.0 2024-08-10 23:49:48,179 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2024-08-10 23:49:53,579 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8400, loss[loss=0.1124, beats_loss=0.01057, ecapa_loss=0.0001735, whisper_loss=0.1001, over 17267.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01168, ecapa_loss=0.0002202, whisper_loss=0.09328, over 3868666.05 frames. ], batch size: 66, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:50:11,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=808750.0, ans=0.125 2024-08-10 23:50:15,003 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 23:50:39,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.636e+01 3.039e+01 3.423e+01 5.250e+01, threshold=6.078e+01, percent-clipped=0.0 2024-08-10 23:50:46,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=809050.0, ans=0.1 2024-08-10 23:50:49,366 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2024-08-10 23:50:53,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=809050.0, ans=0.125 2024-08-10 23:50:54,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=809050.0, ans=0.125 2024-08-10 23:50:56,685 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 29 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 23:50:59,311 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8450, loss[loss=0.1057, beats_loss=0.01256, ecapa_loss=0.0002429, whisper_loss=0.09067, over 15351.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0116, ecapa_loss=0.00022, whisper_loss=0.09383, over 3848948.83 frames. ], batch size: 65, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:51:29,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=809350.0, ans=22.5 2024-08-10 23:51:31,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=809350.0, ans=0.0 2024-08-10 23:52:06,391 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8500, loss[loss=0.11, beats_loss=0.009755, ecapa_loss=0.0002494, whisper_loss=0.09778, over 22000.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01153, ecapa_loss=0.0002224, whisper_loss=0.09377, over 3846992.76 frames. ], batch size: 87, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:52:21,673 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 23:52:23,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=809750.0, ans=0.125 2024-08-10 23:52:25,050 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=15.0 2024-08-10 23:52:29,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=809750.0, ans=0.2 2024-08-10 23:52:42,912 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-10 23:52:44,991 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.20 vs. limit=15.0 2024-08-10 23:52:47,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=809850.0, ans=0.1 2024-08-10 23:52:53,510 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 23:52:59,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.787e+01 3.102e+01 3.651e+01 5.135e+01, threshold=6.204e+01, percent-clipped=0.0 2024-08-10 23:53:08,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=810050.0, ans=0.2 2024-08-10 23:53:19,596 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 23:53:21,953 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8550, loss[loss=0.1714, beats_loss=0.004477, ecapa_loss=0.0002496, whisper_loss=0.1644, over 15680.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01149, ecapa_loss=0.0002211, whisper_loss=0.09416, over 3853650.88 frames. ], batch size: 55, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:53:27,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=810150.0, ans=0.0 2024-08-10 23:53:30,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-10 23:53:31,024 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 23:53:39,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=810250.0, ans=0.0 2024-08-10 23:53:51,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2024-08-10 23:53:51,720 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 23:53:59,231 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 23:53:59,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=810350.0, ans=0.125 2024-08-10 23:54:00,980 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 23:54:03,056 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2024-08-10 23:54:07,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=810450.0, ans=0.0 2024-08-10 23:54:10,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=810450.0, ans=0.125 2024-08-10 23:54:14,714 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 23:54:32,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=810550.0, ans=0.0 2024-08-10 23:54:34,510 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8600, loss[loss=0.1202, beats_loss=0.009613, ecapa_loss=0.0002466, whisper_loss=0.1081, over 18977.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01157, ecapa_loss=0.0002205, whisper_loss=0.09389, over 3826305.19 frames. ], batch size: 75, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:54:48,025 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 23:54:49,570 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 17 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-10 23:54:57,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=810750.0, ans=0.2 2024-08-10 23:55:03,891 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2024-08-10 23:55:06,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=810850.0, ans=0.125 2024-08-10 23:55:11,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=810850.0, ans=0.125 2024-08-10 23:55:15,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=810950.0, ans=0.2 2024-08-10 23:55:23,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.846e+01 3.382e+01 3.840e+01 6.128e+01, threshold=6.764e+01, percent-clipped=0.0 2024-08-10 23:55:31,019 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 23:55:36,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=811050.0, ans=0.5 2024-08-10 23:55:43,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=811150.0, ans=0.1 2024-08-10 23:55:44,482 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8650, loss[loss=0.119, beats_loss=0.01091, ecapa_loss=0.0002116, whisper_loss=0.106, over 15948.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01155, ecapa_loss=0.0002194, whisper_loss=0.09379, over 3820297.63 frames. ], batch size: 61, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:55:50,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=811150.0, ans=0.0 2024-08-10 23:55:51,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=811150.0, ans=0.0 2024-08-10 23:55:58,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=811250.0, ans=0.125 2024-08-10 23:56:08,734 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 23:56:16,836 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 23:56:18,148 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-10 23:56:21,770 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 23:56:32,180 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=22.5 2024-08-10 23:56:39,129 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 23:56:42,061 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 23:56:55,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8700, loss[loss=0.1125, beats_loss=0.009543, ecapa_loss=0.0002439, whisper_loss=0.1005, over 18183.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01149, ecapa_loss=0.00022, whisper_loss=0.09411, over 3818176.61 frames. ], batch size: 72, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:57:02,427 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 18 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-10 23:57:02,891 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=12.0 2024-08-10 23:57:04,401 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.34 vs. limit=6.0 2024-08-10 23:57:06,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=811650.0, ans=0.125 2024-08-10 23:57:19,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=811750.0, ans=0.2 2024-08-10 23:57:27,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=811850.0, ans=0.125 2024-08-10 23:57:28,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=811850.0, ans=0.125 2024-08-10 23:57:41,104 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 22 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-10 23:57:43,311 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=22.5 2024-08-10 23:57:43,677 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.694e+01 2.974e+01 3.412e+01 6.571e+01, threshold=5.947e+01, percent-clipped=0.0 2024-08-10 23:57:53,212 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-10 23:58:04,120 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8750, loss[loss=0.1112, beats_loss=0.01073, ecapa_loss=0.0002104, whisper_loss=0.09838, over 20248.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01142, ecapa_loss=0.0002235, whisper_loss=0.09473, over 3806852.34 frames. ], batch size: 80, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:58:06,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=812150.0, ans=0.5 2024-08-10 23:58:08,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=812150.0, ans=0.1 2024-08-10 23:58:12,732 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.06 vs. limit=15.0 2024-08-10 23:58:28,882 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 23:58:29,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=812250.0, ans=0.2 2024-08-10 23:58:30,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=812350.0, ans=0.125 2024-08-10 23:58:43,837 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 23:59:12,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8800, loss[loss=0.08974, beats_loss=0.01407, ecapa_loss=0.0001709, whisper_loss=0.07396, over 17597.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01157, ecapa_loss=0.0002219, whisper_loss=0.09388, over 3826790.37 frames. ], batch size: 68, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:59:14,840 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 23:59:19,833 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2024-08-10 23:59:27,699 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 23:59:43,502 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 23:59:46,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=812850.0, ans=0.125 2024-08-10 23:59:56,945 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 23:59:59,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.902e+01 3.394e+01 3.776e+01 5.499e+01, threshold=6.788e+01, percent-clipped=0.0 2024-08-11 00:00:21,383 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8850, loss[loss=0.1064, beats_loss=0.01203, ecapa_loss=0.0002122, whisper_loss=0.09228, over 17502.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01151, ecapa_loss=0.0002204, whisper_loss=0.09392, over 3822910.74 frames. ], batch size: 68, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:00:40,877 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-11 00:00:53,138 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-08-11 00:01:22,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=813550.0, ans=0.125 2024-08-11 00:01:23,142 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 00:01:29,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=813650.0, ans=0.125 2024-08-11 00:01:30,499 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8900, loss[loss=0.1124, beats_loss=0.011, ecapa_loss=0.0002441, whisper_loss=0.09898, over 19000.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01158, ecapa_loss=0.0002192, whisper_loss=0.0941, over 3859860.19 frames. ], batch size: 77, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:01:30,709 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 00:01:47,714 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2024-08-11 00:01:51,500 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 00:01:53,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=813750.0, ans=0.0 2024-08-11 00:02:18,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.661e+01 2.983e+01 3.454e+01 5.391e+01, threshold=5.966e+01, percent-clipped=0.0 2024-08-11 00:02:26,218 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 00:02:26,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=814050.0, ans=0.0 2024-08-11 00:02:37,953 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 8950, loss[loss=0.1092, beats_loss=0.01267, ecapa_loss=0.0001966, whisper_loss=0.09458, over 21966.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01158, ecapa_loss=0.0002197, whisper_loss=0.09473, over 3897665.68 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:02:53,284 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-11 00:02:57,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814250.0, ans=0.1 2024-08-11 00:03:03,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=814350.0, ans=0.125 2024-08-11 00:03:04,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=814350.0, ans=0.125 2024-08-11 00:03:20,207 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.78 vs. limit=22.5 2024-08-11 00:03:23,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=814450.0, ans=0.125 2024-08-11 00:03:44,140 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9000, loss[loss=0.1033, beats_loss=0.01639, ecapa_loss=0.0001633, whisper_loss=0.0853, over 16938.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01165, ecapa_loss=0.0002182, whisper_loss=0.09454, over 3907405.70 frames. ], batch size: 66, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:03:44,141 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 00:04:24,169 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on ASR_libri: loss=0.2598, beats_loss=0, ecapa_loss=0.0006942, whisper_loss=0.2529, over 922467.00 frames. 2024-08-11 00:04:43,392 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on SV_voxceleb1: loss=0.005764, beats_loss=0, ecapa_loss=0.0005764, whisper_loss=0, over 939242.00 frames. 2024-08-11 00:04:57,275 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.1026, 3.4560, 3.9740, 3.5198], device='cuda:1') 2024-08-11 00:06:36,415 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.2798, 1.7277, 1.2854, 1.0396, 1.1689, 1.0427, 1.5407, 1.4642], device='cuda:1') 2024-08-11 00:06:37,823 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on AT_audioset: loss=0.02592, beats_loss=0.02592, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 00:06:37,827 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 00:06:53,895 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 00:06:57,550 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 00:06:59,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=814750.0, ans=0.0 2024-08-11 00:07:09,292 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 13 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 00:07:10,730 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 00:07:20,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=814850.0, ans=0.125 2024-08-11 00:07:23,612 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 00:07:30,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.866e+01 3.382e+01 4.145e+01 7.682e+01, threshold=6.764e+01, percent-clipped=3.0 2024-08-11 00:07:40,716 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 00:07:50,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=815050.0, ans=0.0 2024-08-11 00:07:54,309 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9050, loss[loss=0.08491, beats_loss=0.01534, ecapa_loss=0.0001911, whisper_loss=0.06765, over 21423.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01169, ecapa_loss=0.0002177, whisper_loss=0.09469, over 3899213.70 frames. ], batch size: 87, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:08:09,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=815250.0, ans=0.125 2024-08-11 00:08:16,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=815250.0, ans=0.125 2024-08-11 00:08:46,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=815450.0, ans=0.125 2024-08-11 00:09:06,995 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 00:09:08,066 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9100, loss[loss=0.1041, beats_loss=0.01125, ecapa_loss=0.0002548, whisper_loss=0.0903, over 21704.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01165, ecapa_loss=0.0002184, whisper_loss=0.09475, over 3926778.09 frames. ], batch size: 93, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:09:16,003 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 00:09:20,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=815650.0, ans=0.07 2024-08-11 00:09:24,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=815750.0, ans=0.125 2024-08-11 00:09:26,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=815750.0, ans=0.125 2024-08-11 00:09:48,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=815850.0, ans=0.0 2024-08-11 00:09:50,420 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2024-08-11 00:09:58,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.712e+01 2.999e+01 3.385e+01 5.028e+01, threshold=5.998e+01, percent-clipped=0.0 2024-08-11 00:10:00,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=815950.0, ans=0.0 2024-08-11 00:10:00,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=815950.0, ans=0.0 2024-08-11 00:10:00,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=815950.0, ans=0.125 2024-08-11 00:10:05,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=816050.0, ans=0.125 2024-08-11 00:10:20,707 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9150, loss[loss=0.1138, beats_loss=0.01146, ecapa_loss=0.0002086, whisper_loss=0.1002, over 22878.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01164, ecapa_loss=0.0002175, whisper_loss=0.09492, over 3922510.60 frames. ], batch size: 90, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:10:25,174 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 23 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 00:10:55,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=816350.0, ans=0.125 2024-08-11 00:11:01,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=816350.0, ans=0.125 2024-08-11 00:11:02,766 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-11 00:11:13,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=816450.0, ans=0.0 2024-08-11 00:11:18,367 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 00:11:35,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=816650.0, ans=0.2 2024-08-11 00:11:36,475 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9200, loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0002592, whisper_loss=0.08843, over 15102.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0117, ecapa_loss=0.0002176, whisper_loss=0.0946, over 3943279.26 frames. ], batch size: 64, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:12:01,449 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 00:12:03,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.47 vs. limit=22.5 2024-08-11 00:12:07,807 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 00:12:17,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=816850.0, ans=0.125 2024-08-11 00:12:17,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=816850.0, ans=0.125 2024-08-11 00:12:19,596 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 21 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 00:12:21,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=816950.0, ans=0.125 2024-08-11 00:12:22,608 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 00:12:24,118 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 00:12:27,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.632e+01 3.033e+01 3.497e+01 1.383e+02, threshold=6.066e+01, percent-clipped=1.0 2024-08-11 00:12:41,691 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:12:44,287 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 00:12:48,904 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9250, loss[loss=0.1196, beats_loss=0.0122, ecapa_loss=0.0002141, whisper_loss=0.1052, over 15077.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01165, ecapa_loss=0.0002194, whisper_loss=0.09488, over 3943122.24 frames. ], batch size: 58, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:12:54,717 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.295e-03 2024-08-11 00:12:58,599 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 00:13:10,077 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 00:13:15,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=817250.0, ans=0.125 2024-08-11 00:13:15,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=817250.0, ans=0.125 2024-08-11 00:13:24,292 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 00:13:32,841 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-11 00:13:34,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=817450.0, ans=0.125 2024-08-11 00:13:43,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=817450.0, ans=0.05 2024-08-11 00:13:46,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=22.5 2024-08-11 00:13:51,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=817550.0, ans=0.125 2024-08-11 00:13:53,475 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.30 vs. limit=22.5 2024-08-11 00:13:57,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=817550.0, ans=0.0 2024-08-11 00:14:02,188 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 00:14:06,657 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9300, loss[loss=0.1011, beats_loss=0.01118, ecapa_loss=0.0001937, whisper_loss=0.08794, over 15236.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01168, ecapa_loss=0.0002189, whisper_loss=0.09437, over 3936800.94 frames. ], batch size: 59, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:14:26,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=817750.0, ans=0.125 2024-08-11 00:14:29,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=817750.0, ans=0.0 2024-08-11 00:14:42,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=817850.0, ans=0.125 2024-08-11 00:14:45,031 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 00:14:45,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=817850.0, ans=0.125 2024-08-11 00:14:58,221 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.667e+01 2.966e+01 3.383e+01 7.144e+01, threshold=5.931e+01, percent-clipped=1.0 2024-08-11 00:15:04,039 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 00:15:09,369 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 00:15:17,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=818050.0, ans=0.125 2024-08-11 00:15:19,590 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9350, loss[loss=0.1243, beats_loss=0.009633, ecapa_loss=0.000228, whisper_loss=0.1124, over 22274.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01168, ecapa_loss=0.0002175, whisper_loss=0.0944, over 3903112.28 frames. ], batch size: 90, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:15:22,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=818150.0, ans=0.2 2024-08-11 00:15:23,676 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 00:15:28,175 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-11 00:15:49,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=818350.0, ans=15.0 2024-08-11 00:15:57,018 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=8.189e-02 2024-08-11 00:16:07,010 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 00:16:08,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=818450.0, ans=0.0 2024-08-11 00:16:22,541 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-08-11 00:16:32,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9400, loss[loss=0.09735, beats_loss=0.01138, ecapa_loss=0.0002229, whisper_loss=0.08374, over 14705.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01173, ecapa_loss=0.0002178, whisper_loss=0.09325, over 3862806.03 frames. ], batch size: 58, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:16:35,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=818650.0, ans=0.125 2024-08-11 00:16:39,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=818650.0, ans=0.0 2024-08-11 00:16:48,260 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 00:16:54,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=818750.0, ans=0.05 2024-08-11 00:16:55,238 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2024-08-11 00:16:56,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=818750.0, ans=0.1 2024-08-11 00:16:58,012 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2024-08-11 00:16:58,793 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 29 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 00:17:06,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=818850.0, ans=0.125 2024-08-11 00:17:09,071 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-11 00:17:16,756 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2024-08-11 00:17:19,091 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 00:17:20,355 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.820e+01 3.162e+01 3.777e+01 5.486e+01, threshold=6.323e+01, percent-clipped=0.0 2024-08-11 00:17:41,127 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9450, loss[loss=0.09706, beats_loss=0.01333, ecapa_loss=0.0001945, whisper_loss=0.08178, over 22812.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01173, ecapa_loss=0.000217, whisper_loss=0.09284, over 3875354.22 frames. ], batch size: 92, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:17:44,474 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.15 vs. limit=15.0 2024-08-11 00:17:48,720 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-11 00:18:00,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=819250.0, ans=0.0 2024-08-11 00:18:11,276 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 24 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 00:18:17,172 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:18:17,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=819350.0, ans=0.2 2024-08-11 00:18:19,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=819350.0, ans=0.0 2024-08-11 00:18:23,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=819450.0, ans=0.0 2024-08-11 00:18:28,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=819450.0, ans=0.2 2024-08-11 00:18:34,373 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 21 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-11 00:18:35,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819550.0, ans=0.1 2024-08-11 00:18:48,728 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9500, loss[loss=0.1038, beats_loss=0.009838, ecapa_loss=0.0002639, whisper_loss=0.09137, over 22187.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01176, ecapa_loss=0.0002171, whisper_loss=0.09256, over 3897429.53 frames. ], batch size: 92, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:19:13,275 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-11 00:19:14,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819750.0, ans=0.1 2024-08-11 00:19:37,263 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.201e+01 2.851e+01 3.283e+01 3.927e+01 7.522e+01, threshold=6.566e+01, percent-clipped=2.0 2024-08-11 00:19:45,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=820050.0, ans=0.125 2024-08-11 00:19:55,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=820050.0, ans=0.125 2024-08-11 00:19:58,516 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9550, loss[loss=0.1148, beats_loss=0.00993, ecapa_loss=0.0001952, whisper_loss=0.1029, over 18241.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01174, ecapa_loss=0.000218, whisper_loss=0.0922, over 3885630.76 frames. ], batch size: 66, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:19:58,709 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 00:20:01,215 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-11 00:20:06,793 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 00:20:17,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=820250.0, ans=0.2 2024-08-11 00:20:21,291 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 18 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-11 00:20:25,888 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.89 vs. limit=15.0 2024-08-11 00:20:29,714 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.10 vs. limit=10.0 2024-08-11 00:20:34,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=820350.0, ans=0.1 2024-08-11 00:20:57,025 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-11 00:20:58,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=820550.0, ans=0.125 2024-08-11 00:21:04,556 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9600, loss[loss=0.1198, beats_loss=0.009821, ecapa_loss=0.0002092, whisper_loss=0.1079, over 15367.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01167, ecapa_loss=0.0002172, whisper_loss=0.09271, over 3867643.83 frames. ], batch size: 59, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:21:21,866 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-11 00:21:23,038 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-11 00:21:25,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=820750.0, ans=15.0 2024-08-11 00:21:35,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=820850.0, ans=0.125 2024-08-11 00:21:37,878 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 00:21:40,490 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 34 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 00:21:46,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=820950.0, ans=0.07 2024-08-11 00:21:50,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.676e+01 3.117e+01 3.565e+01 7.658e+01, threshold=6.234e+01, percent-clipped=1.0 2024-08-11 00:21:51,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=820950.0, ans=0.1 2024-08-11 00:22:04,975 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.74 vs. limit=10.0 2024-08-11 00:22:08,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=821050.0, ans=0.125 2024-08-11 00:22:10,837 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9650, loss[loss=0.1096, beats_loss=0.01268, ecapa_loss=0.0002014, whisper_loss=0.0949, over 17491.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01164, ecapa_loss=0.0002185, whisper_loss=0.09274, over 3815700.20 frames. ], batch size: 69, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:22:16,643 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 15 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 00:22:20,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=821150.0, ans=0.2 2024-08-11 00:22:27,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=821250.0, ans=0.0 2024-08-11 00:22:31,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821250.0, ans=0.1 2024-08-11 00:23:07,618 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 17 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-11 00:23:16,412 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9700, loss[loss=0.09979, beats_loss=0.01223, ecapa_loss=0.0002612, whisper_loss=0.08494, over 19710.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0116, ecapa_loss=0.0002216, whisper_loss=0.09261, over 3819133.52 frames. ], batch size: 83, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:23:22,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=821650.0, ans=0.125 2024-08-11 00:23:24,750 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 00:23:30,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=821750.0, ans=10.0 2024-08-11 00:23:33,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=821750.0, ans=0.125 2024-08-11 00:23:35,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=821750.0, ans=0.125 2024-08-11 00:23:46,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=821850.0, ans=0.125 2024-08-11 00:23:49,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=821850.0, ans=0.2 2024-08-11 00:24:02,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.803e+01 3.195e+01 3.718e+01 6.974e+01, threshold=6.391e+01, percent-clipped=1.0 2024-08-11 00:24:21,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822150.0, ans=0.1 2024-08-11 00:24:22,392 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9750, loss[loss=0.1119, beats_loss=0.01032, ecapa_loss=0.0002628, whisper_loss=0.0989, over 21373.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01162, ecapa_loss=0.0002208, whisper_loss=0.09295, over 3790543.30 frames. ], batch size: 90, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:24:41,318 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.06 vs. limit=22.5 2024-08-11 00:24:44,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=822250.0, ans=0.2 2024-08-11 00:24:47,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=822350.0, ans=0.1 2024-08-11 00:24:50,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=822350.0, ans=0.0 2024-08-11 00:25:03,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=822450.0, ans=0.07 2024-08-11 00:25:13,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=822550.0, ans=0.0 2024-08-11 00:25:20,465 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.38 vs. limit=10.0 2024-08-11 00:25:26,191 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9800, loss[loss=0.1117, beats_loss=0.01138, ecapa_loss=0.0002014, whisper_loss=0.09826, over 20207.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01155, ecapa_loss=0.0002187, whisper_loss=0.09368, over 3819358.97 frames. ], batch size: 81, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:25:34,452 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 00:25:50,383 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 00:25:59,664 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 00:26:12,131 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.728e+01 3.058e+01 3.533e+01 7.097e+01, threshold=6.116e+01, percent-clipped=1.0 2024-08-11 00:26:14,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822950.0, ans=0.1 2024-08-11 00:26:25,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=823050.0, ans=0.125 2024-08-11 00:26:29,517 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 00:26:31,804 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9850, loss[loss=0.1093, beats_loss=0.0117, ecapa_loss=0.0001458, whisper_loss=0.09615, over 18727.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.0116, ecapa_loss=0.0002175, whisper_loss=0.09349, over 3838698.92 frames. ], batch size: 69, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:26:51,687 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 00:26:53,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=823250.0, ans=0.125 2024-08-11 00:26:54,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=823250.0, ans=0.125 2024-08-11 00:27:08,300 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 00:27:10,879 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 00:27:14,260 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=12.0 2024-08-11 00:27:20,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=823450.0, ans=0.0 2024-08-11 00:27:20,852 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-08-11 00:27:26,766 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 00:27:28,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=823550.0, ans=0.95 2024-08-11 00:27:32,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=823550.0, ans=10.0 2024-08-11 00:27:33,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=823550.0, ans=0.015 2024-08-11 00:27:34,746 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 00:27:37,736 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9900, loss[loss=0.09546, beats_loss=0.01098, ecapa_loss=0.0002546, whisper_loss=0.08194, over 15115.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01162, ecapa_loss=0.0002172, whisper_loss=0.09371, over 3869841.88 frames. ], batch size: 58, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:27:39,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=823650.0, ans=0.125 2024-08-11 00:28:11,707 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 00:28:23,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=823950.0, ans=0.2 2024-08-11 00:28:23,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.723e+01 2.993e+01 3.476e+01 9.466e+01, threshold=5.985e+01, percent-clipped=1.0 2024-08-11 00:28:29,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=824050.0, ans=0.2 2024-08-11 00:28:32,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=824050.0, ans=0.125 2024-08-11 00:28:43,472 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 9950, loss[loss=0.09471, beats_loss=0.01233, ecapa_loss=0.0001637, whisper_loss=0.08074, over 15415.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01159, ecapa_loss=0.0002189, whisper_loss=0.09421, over 3879517.88 frames. ], batch size: 57, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:29:07,114 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 00:29:17,200 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=15.0 2024-08-11 00:29:20,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=824350.0, ans=0.0 2024-08-11 00:29:26,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=824450.0, ans=0.2 2024-08-11 00:29:29,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=824450.0, ans=0.0 2024-08-11 00:29:48,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10000, loss[loss=0.1276, beats_loss=0.01064, ecapa_loss=0.0002, whisper_loss=0.1149, over 18863.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01158, ecapa_loss=0.0002184, whisper_loss=0.09474, over 3833153.96 frames. ], batch size: 72, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:29:52,458 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 00:29:55,619 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2024-08-11 00:30:16,637 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 00:30:27,368 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.76 vs. limit=22.5 2024-08-11 00:30:35,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=824950.0, ans=6.0 2024-08-11 00:30:37,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.691e+01 3.032e+01 3.574e+01 5.004e+01, threshold=6.065e+01, percent-clipped=0.0 2024-08-11 00:30:37,634 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 00:30:43,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=825050.0, ans=0.035 2024-08-11 00:30:43,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=825050.0, ans=0.2 2024-08-11 00:30:48,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=825050.0, ans=0.125 2024-08-11 00:30:49,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=825050.0, ans=0.0 2024-08-11 00:30:55,882 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 29 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-11 00:30:56,870 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10050, loss[loss=0.133, beats_loss=0.008574, ecapa_loss=0.0002145, whisper_loss=0.1223, over 17272.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01154, ecapa_loss=0.000219, whisper_loss=0.09487, over 3858070.54 frames. ], batch size: 65, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:30:57,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=825150.0, ans=0.125 2024-08-11 00:31:06,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=825150.0, ans=0.125 2024-08-11 00:31:11,614 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-11 00:31:25,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=825250.0, ans=0.0 2024-08-11 00:32:10,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=825450.0, ans=22.5 2024-08-11 00:32:25,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=825550.0, ans=0.0 2024-08-11 00:32:31,659 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 00:32:35,419 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10100, loss[loss=0.09519, beats_loss=0.01464, ecapa_loss=0.0001964, whisper_loss=0.07859, over 21875.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.0115, ecapa_loss=0.0002197, whisper_loss=0.09494, over 3878464.12 frames. ], batch size: 92, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:32:57,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=825750.0, ans=0.2 2024-08-11 00:32:59,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=825750.0, ans=0.0 2024-08-11 00:33:08,371 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.43 vs. limit=15.0 2024-08-11 00:33:11,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=825750.0, ans=0.125 2024-08-11 00:33:13,686 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 00:33:20,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=825850.0, ans=0.0 2024-08-11 00:33:36,405 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 00:33:41,256 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 00:33:45,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=825950.0, ans=0.125 2024-08-11 00:33:48,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=825950.0, ans=0.125 2024-08-11 00:33:55,013 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.818e+01 3.128e+01 3.591e+01 5.480e+01, threshold=6.256e+01, percent-clipped=0.0 2024-08-11 00:33:55,233 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-11 00:33:58,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=825950.0, ans=0.125 2024-08-11 00:34:05,544 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 00:34:09,235 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.77 vs. limit=10.0 2024-08-11 00:34:34,681 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10150, loss[loss=0.114, beats_loss=0.008824, ecapa_loss=0.0002133, whisper_loss=0.103, over 16553.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01151, ecapa_loss=0.0002197, whisper_loss=0.09502, over 3892711.39 frames. ], batch size: 64, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:35:22,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=826350.0, ans=0.125 2024-08-11 00:35:32,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=826350.0, ans=0.2 2024-08-11 00:35:38,890 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-11 00:35:55,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=826450.0, ans=0.5 2024-08-11 00:36:10,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=826550.0, ans=0.125 2024-08-11 00:36:11,828 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 00:36:20,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=826550.0, ans=0.125 2024-08-11 00:36:37,723 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10200, loss[loss=0.09755, beats_loss=0.01184, ecapa_loss=0.0002461, whisper_loss=0.08326, over 22031.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01149, ecapa_loss=0.0002198, whisper_loss=0.09489, over 3926439.83 frames. ], batch size: 90, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:36:51,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=826650.0, ans=0.125 2024-08-11 00:37:03,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=826750.0, ans=0.125 2024-08-11 00:37:07,353 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:37:15,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=826750.0, ans=0.2 2024-08-11 00:37:22,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=826850.0, ans=0.125 2024-08-11 00:37:26,650 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 00:37:29,719 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 00:37:37,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=826950.0, ans=0.1 2024-08-11 00:37:40,980 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.078e+01 2.717e+01 3.021e+01 3.434e+01 5.708e+01, threshold=6.043e+01, percent-clipped=0.0 2024-08-11 00:37:43,133 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2024-08-11 00:37:48,765 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 00:38:03,708 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10250, loss[loss=0.1041, beats_loss=0.01292, ecapa_loss=0.0001809, whisper_loss=0.08939, over 21726.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01152, ecapa_loss=0.0002198, whisper_loss=0.09512, over 3932680.13 frames. ], batch size: 86, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:38:04,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.62 vs. limit=12.0 2024-08-11 00:38:11,361 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 31 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 00:38:28,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=827250.0, ans=0.125 2024-08-11 00:38:30,313 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.66 vs. limit=6.0 2024-08-11 00:39:06,638 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 00:39:07,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=827550.0, ans=0.0 2024-08-11 00:39:17,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=827550.0, ans=0.0 2024-08-11 00:39:19,628 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10300, loss[loss=0.09433, beats_loss=0.01112, ecapa_loss=0.0002216, whisper_loss=0.08099, over 20427.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01155, ecapa_loss=0.0002192, whisper_loss=0.09484, over 3945626.41 frames. ], batch size: 83, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:39:25,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=827650.0, ans=10.0 2024-08-11 00:39:29,156 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 00:39:38,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=827750.0, ans=0.125 2024-08-11 00:39:48,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=827750.0, ans=0.1 2024-08-11 00:39:49,868 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-11 00:40:05,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=827950.0, ans=0.125 2024-08-11 00:40:13,040 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.682e+01 2.875e+01 3.472e+01 4.715e+01, threshold=5.749e+01, percent-clipped=0.0 2024-08-11 00:40:36,204 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10350, loss[loss=0.1116, beats_loss=0.01465, ecapa_loss=0.000184, whisper_loss=0.0951, over 19328.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01156, ecapa_loss=0.0002189, whisper_loss=0.09476, over 3936760.57 frames. ], batch size: 78, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:40:36,428 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 00:40:37,788 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 24 from LS+wenet, 14 from Vox, 17 fro AS 2024-08-11 00:40:55,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=828250.0, ans=0.125 2024-08-11 00:41:08,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=828350.0, ans=0.0 2024-08-11 00:41:10,262 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 00:41:21,780 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-11 00:41:26,534 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-08-11 00:41:27,415 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 31 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 00:41:32,889 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2024-08-11 00:41:50,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-11 00:41:51,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=828550.0, ans=0.0 2024-08-11 00:41:54,220 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10400, loss[loss=0.1097, beats_loss=0.01297, ecapa_loss=0.0001654, whisper_loss=0.09504, over 24650.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01159, ecapa_loss=0.0002165, whisper_loss=0.09458, over 3922334.69 frames. ], batch size: 92, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:41:55,415 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 00:41:58,798 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.20 vs. limit=22.5 2024-08-11 00:42:02,189 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 00:42:10,613 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-11 00:42:19,688 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 28 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 00:42:21,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=828750.0, ans=0.0 2024-08-11 00:42:30,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=828850.0, ans=0.5 2024-08-11 00:42:35,700 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 00:42:47,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=828950.0, ans=0.125 2024-08-11 00:42:49,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.708e+01 2.999e+01 3.498e+01 5.568e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 00:43:07,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=829050.0, ans=0.5 2024-08-11 00:43:11,236 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 00:43:14,202 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10450, loss[loss=0.1214, beats_loss=0.01198, ecapa_loss=0.0001995, whisper_loss=0.1074, over 23990.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01161, ecapa_loss=0.0002175, whisper_loss=0.09459, over 3906141.70 frames. ], batch size: 92, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:44:13,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=829450.0, ans=0.0 2024-08-11 00:44:16,509 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 00:44:18,175 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 12 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 00:44:19,941 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 00:44:35,662 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10500, loss[loss=0.115, beats_loss=0.009606, ecapa_loss=0.000246, whisper_loss=0.1029, over 18965.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01159, ecapa_loss=0.0002176, whisper_loss=0.09465, over 3914001.73 frames. ], batch size: 75, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:44:43,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=829650.0, ans=0.2 2024-08-11 00:44:48,075 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2024-08-11 00:45:02,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=829750.0, ans=0.2 2024-08-11 00:45:04,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=829850.0, ans=0.5 2024-08-11 00:45:06,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=829850.0, ans=0.0 2024-08-11 00:45:07,073 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:45:11,768 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 00:45:15,674 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 00:45:15,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=829850.0, ans=0.0 2024-08-11 00:45:16,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=829850.0, ans=0.0 2024-08-11 00:45:16,342 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2024-08-11 00:45:17,250 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 00:45:27,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.730e+01 2.985e+01 3.287e+01 5.938e+01, threshold=5.970e+01, percent-clipped=0.0 2024-08-11 00:45:42,466 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 18 from LS+wenet, 32 from Vox, 40 fro AS 2024-08-11 00:45:43,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2024-08-11 00:45:45,450 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 00:45:47,010 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 00:45:49,730 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10550, loss[loss=0.09111, beats_loss=0.01444, ecapa_loss=0.0002459, whisper_loss=0.07421, over 17016.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01167, ecapa_loss=0.0002174, whisper_loss=0.09398, over 3937303.82 frames. ], batch size: 72, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:45:54,470 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 41 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 00:45:58,357 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 00:46:24,134 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 00:46:32,787 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 00:46:50,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=830450.0, ans=0.02 2024-08-11 00:46:50,738 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.94 vs. limit=12.0 2024-08-11 00:46:58,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=830550.0, ans=0.2 2024-08-11 00:47:08,493 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10600, loss[loss=0.105, beats_loss=0.01244, ecapa_loss=0.0002212, whisper_loss=0.09035, over 17870.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01166, ecapa_loss=0.0002184, whisper_loss=0.0939, over 3893416.61 frames. ], batch size: 71, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:47:32,331 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.23 vs. limit=10.0 2024-08-11 00:47:50,461 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 00:48:00,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.647e+01 3.131e+01 3.600e+01 5.761e+01, threshold=6.263e+01, percent-clipped=0.0 2024-08-11 00:48:03,850 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-11 00:48:23,829 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10650, loss[loss=0.09209, beats_loss=0.01247, ecapa_loss=0.0002321, whisper_loss=0.07729, over 21022.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01168, ecapa_loss=0.0002174, whisper_loss=0.09398, over 3896320.41 frames. ], batch size: 89, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:48:25,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=831150.0, ans=0.1 2024-08-11 00:48:43,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=831250.0, ans=0.125 2024-08-11 00:48:50,002 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 00:49:40,200 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10700, loss[loss=0.1237, beats_loss=0.01146, ecapa_loss=0.0001833, whisper_loss=0.1104, over 14974.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01172, ecapa_loss=0.000216, whisper_loss=0.09378, over 3925953.22 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:49:42,115 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 00:49:43,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=831650.0, ans=0.0 2024-08-11 00:49:49,323 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.947e-02 2024-08-11 00:49:56,624 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 00:50:31,521 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.817e+01 3.065e+01 3.573e+01 8.621e+01, threshold=6.130e+01, percent-clipped=1.0 2024-08-11 00:50:34,701 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 00:50:46,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=832050.0, ans=0.125 2024-08-11 00:50:53,651 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10750, loss[loss=0.09296, beats_loss=0.01314, ecapa_loss=0.0002115, whisper_loss=0.0777, over 19090.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01174, ecapa_loss=0.0002148, whisper_loss=0.09348, over 3904409.23 frames. ], batch size: 80, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:51:12,473 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-08-11 00:51:22,101 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=12.0 2024-08-11 00:51:24,331 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 00:51:24,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832350.0, ans=0.1 2024-08-11 00:51:28,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.43 vs. limit=12.0 2024-08-11 00:51:30,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=832350.0, ans=0.0 2024-08-11 00:51:45,818 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 32 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 00:51:56,997 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-08-11 00:51:59,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=832550.0, ans=0.125 2024-08-11 00:52:09,285 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 00:52:11,011 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10800, loss[loss=0.1065, beats_loss=0.009214, ecapa_loss=0.0002257, whisper_loss=0.09501, over 18346.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.0116, ecapa_loss=0.0002161, whisper_loss=0.09496, over 3919165.34 frames. ], batch size: 69, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:52:11,214 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 00:52:15,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=832650.0, ans=0.1 2024-08-11 00:52:16,925 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 00:52:27,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=832750.0, ans=0.125 2024-08-11 00:52:40,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=832750.0, ans=0.0 2024-08-11 00:52:41,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=832850.0, ans=0.125 2024-08-11 00:52:52,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=832850.0, ans=0.0 2024-08-11 00:52:54,301 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 30 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 00:52:54,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=832850.0, ans=0.125 2024-08-11 00:53:04,607 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.723e+01 3.219e+01 3.827e+01 1.923e+02, threshold=6.438e+01, percent-clipped=1.0 2024-08-11 00:53:25,581 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 00:53:26,874 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10850, loss[loss=0.1121, beats_loss=0.009518, ecapa_loss=0.00024, whisper_loss=0.1002, over 19080.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01143, ecapa_loss=0.0002188, whisper_loss=0.09536, over 3917092.87 frames. ], batch size: 77, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:53:42,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=833250.0, ans=0.0 2024-08-11 00:53:48,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=833250.0, ans=0.1 2024-08-11 00:54:03,675 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 00:54:05,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=833350.0, ans=0.0 2024-08-11 00:54:36,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=833550.0, ans=0.2 2024-08-11 00:54:43,348 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10900, loss[loss=0.09014, beats_loss=0.01447, ecapa_loss=0.0001793, whisper_loss=0.07388, over 20077.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.0115, ecapa_loss=0.0002191, whisper_loss=0.09545, over 3934573.67 frames. ], batch size: 82, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:55:00,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=833750.0, ans=0.125 2024-08-11 00:55:05,667 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 00:55:07,577 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-11 00:55:33,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=833950.0, ans=0.125 2024-08-11 00:55:33,602 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.411e-02 2024-08-11 00:55:33,970 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2024-08-11 00:55:35,791 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.641e+01 2.975e+01 3.587e+01 5.714e+01, threshold=5.950e+01, percent-clipped=0.0 2024-08-11 00:55:58,423 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 10950, loss[loss=0.1059, beats_loss=0.01116, ecapa_loss=0.0002279, whisper_loss=0.09246, over 21444.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01146, ecapa_loss=0.0002187, whisper_loss=0.09548, over 3961019.51 frames. ], batch size: 87, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:56:03,128 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 00:56:12,513 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 00:56:26,228 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 00:56:26,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=834250.0, ans=0.125 2024-08-11 00:56:29,185 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 16 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 00:56:52,734 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-11 00:57:00,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=834550.0, ans=0.025 2024-08-11 00:57:13,081 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11000, loss[loss=0.1123, beats_loss=0.01277, ecapa_loss=0.000281, whisper_loss=0.09671, over 20745.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01142, ecapa_loss=0.0002195, whisper_loss=0.0955, over 3934463.84 frames. ], batch size: 90, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:57:31,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=834750.0, ans=0.125 2024-08-11 00:57:41,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=834750.0, ans=0.125 2024-08-11 00:57:49,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=834850.0, ans=0.1 2024-08-11 00:57:51,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=834850.0, ans=0.0 2024-08-11 00:58:06,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 2.814e+01 3.042e+01 3.466e+01 5.998e+01, threshold=6.084e+01, percent-clipped=1.0 2024-08-11 00:58:11,726 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.769e-02 2024-08-11 00:58:13,232 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 00:58:15,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=835050.0, ans=0.125 2024-08-11 00:58:27,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=835050.0, ans=0.125 2024-08-11 00:58:30,968 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11050, loss[loss=0.1287, beats_loss=0.008814, ecapa_loss=0.0001856, whisper_loss=0.118, over 17597.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01147, ecapa_loss=0.0002197, whisper_loss=0.09543, over 3936103.37 frames. ], batch size: 64, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:58:40,050 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-08-11 00:58:43,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=835150.0, ans=0.2 2024-08-11 00:58:46,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=835250.0, ans=0.2 2024-08-11 00:58:46,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2024-08-11 00:58:54,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=835250.0, ans=0.2 2024-08-11 00:58:59,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=835250.0, ans=0.125 2024-08-11 00:58:59,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=835250.0, ans=0.125 2024-08-11 00:59:01,844 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=12.0 2024-08-11 00:59:03,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=835250.0, ans=0.035 2024-08-11 00:59:05,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835350.0, ans=0.1 2024-08-11 00:59:17,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=835350.0, ans=0.125 2024-08-11 00:59:17,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=835350.0, ans=0.125 2024-08-11 00:59:18,737 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 00:59:45,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=835550.0, ans=0.0 2024-08-11 00:59:51,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=835550.0, ans=0.05 2024-08-11 00:59:58,326 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11100, loss[loss=0.09979, beats_loss=0.01243, ecapa_loss=0.000224, whisper_loss=0.08512, over 16759.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01145, ecapa_loss=0.0002208, whisper_loss=0.09546, over 3943567.76 frames. ], batch size: 68, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:00:00,814 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2024-08-11 01:00:09,382 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 13 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 01:00:41,762 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 01:00:44,653 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 01:00:53,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.772e+01 3.086e+01 3.680e+01 7.620e+01, threshold=6.173e+01, percent-clipped=1.0 2024-08-11 01:01:08,337 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 15 from LS+wenet, 33 from Vox, 27 fro AS 2024-08-11 01:01:19,031 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11150, loss[loss=0.1223, beats_loss=0.009356, ecapa_loss=0.0002112, whisper_loss=0.1108, over 15252.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01151, ecapa_loss=0.0002179, whisper_loss=0.09471, over 3916088.92 frames. ], batch size: 59, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:01:20,998 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 19 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 01:01:31,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=836150.0, ans=0.0 2024-08-11 01:01:33,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=836250.0, ans=0.1 2024-08-11 01:01:34,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=836250.0, ans=0.125 2024-08-11 01:01:51,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=836350.0, ans=0.2 2024-08-11 01:02:06,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=836450.0, ans=0.125 2024-08-11 01:02:07,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=836450.0, ans=0.0 2024-08-11 01:02:21,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=836550.0, ans=0.125 2024-08-11 01:02:33,815 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 01:02:36,642 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11200, loss[loss=0.08624, beats_loss=0.01589, ecapa_loss=0.0002348, whisper_loss=0.068, over 17513.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01154, ecapa_loss=0.0002186, whisper_loss=0.09469, over 3910997.35 frames. ], batch size: 73, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:02:59,270 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 01:03:02,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=836750.0, ans=0.1 2024-08-11 01:03:04,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=836750.0, ans=0.125 2024-08-11 01:03:06,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=836750.0, ans=0.05 2024-08-11 01:03:24,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=836850.0, ans=0.125 2024-08-11 01:03:35,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.788e+01 3.078e+01 3.604e+01 6.278e+01, threshold=6.156e+01, percent-clipped=2.0 2024-08-11 01:04:00,997 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11250, loss[loss=0.1085, beats_loss=0.01114, ecapa_loss=0.0002106, whisper_loss=0.09526, over 16832.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.0115, ecapa_loss=0.0002177, whisper_loss=0.0952, over 3908673.16 frames. ], batch size: 67, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:04:01,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=837150.0, ans=0.125 2024-08-11 01:04:21,567 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 01:04:21,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=837250.0, ans=0.0 2024-08-11 01:04:37,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=837350.0, ans=0.125 2024-08-11 01:04:42,941 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-11 01:04:54,642 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 01:05:10,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=837550.0, ans=0.1 2024-08-11 01:05:11,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=837550.0, ans=0.2 2024-08-11 01:05:15,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-11 01:05:25,283 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11300, loss[loss=0.1059, beats_loss=0.01126, ecapa_loss=0.0002212, whisper_loss=0.09238, over 22838.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01158, ecapa_loss=0.0002191, whisper_loss=0.09463, over 3926400.38 frames. ], batch size: 92, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:05:31,343 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 01:05:47,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=837750.0, ans=0.1 2024-08-11 01:06:08,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=837850.0, ans=0.2 2024-08-11 01:06:21,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.704e+01 3.204e+01 3.789e+01 1.454e+02, threshold=6.408e+01, percent-clipped=1.0 2024-08-11 01:06:25,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=837950.0, ans=0.2 2024-08-11 01:06:34,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=838050.0, ans=0.0 2024-08-11 01:06:40,846 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 01:06:45,806 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11350, loss[loss=0.1086, beats_loss=0.01192, ecapa_loss=0.0002465, whisper_loss=0.09422, over 18649.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01151, ecapa_loss=0.0002191, whisper_loss=0.09472, over 3928958.98 frames. ], batch size: 77, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:06:48,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=838150.0, ans=0.1 2024-08-11 01:07:12,379 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 01:07:24,616 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2024-08-11 01:07:29,286 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 01:07:40,545 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 01:07:42,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=838450.0, ans=0.125 2024-08-11 01:07:46,751 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 01:07:53,684 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2024-08-11 01:08:00,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=838550.0, ans=0.07 2024-08-11 01:08:03,394 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11400, loss[loss=0.1088, beats_loss=0.01159, ecapa_loss=0.0001924, whisper_loss=0.09529, over 17434.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01147, ecapa_loss=0.0002182, whisper_loss=0.09447, over 3890317.17 frames. ], batch size: 70, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:08:39,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=838850.0, ans=0.04949747468305833 2024-08-11 01:08:40,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=838850.0, ans=0.015 2024-08-11 01:08:55,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=838950.0, ans=0.125 2024-08-11 01:08:58,499 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.886e+01 3.314e+01 4.166e+01 1.030e+02, threshold=6.628e+01, percent-clipped=1.0 2024-08-11 01:09:09,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=839050.0, ans=0.125 2024-08-11 01:09:20,987 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11450, loss[loss=0.1112, beats_loss=0.01148, ecapa_loss=0.0002078, whisper_loss=0.09761, over 19783.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01152, ecapa_loss=0.0002171, whisper_loss=0.09452, over 3920055.12 frames. ], batch size: 79, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:09:24,621 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 01:09:48,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=839250.0, ans=0.125 2024-08-11 01:10:08,869 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 01:10:10,169 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 01:10:15,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=839450.0, ans=0.125 2024-08-11 01:10:42,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=839650.0, ans=0.125 2024-08-11 01:10:44,072 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11500, loss[loss=0.1108, beats_loss=0.01134, ecapa_loss=0.0002395, whisper_loss=0.09703, over 22615.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01141, ecapa_loss=0.0002178, whisper_loss=0.09496, over 3911591.90 frames. ], batch size: 94, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:10:49,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=839650.0, ans=0.0 2024-08-11 01:10:52,311 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2024-08-11 01:10:56,496 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 01:10:58,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=839750.0, ans=0.0 2024-08-11 01:11:11,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=839750.0, ans=0.125 2024-08-11 01:11:31,468 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 01:11:31,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=839950.0, ans=0.125 2024-08-11 01:11:43,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+01 2.719e+01 3.134e+01 3.590e+01 4.797e+01, threshold=6.268e+01, percent-clipped=0.0 2024-08-11 01:11:47,128 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-11 01:11:49,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=840050.0, ans=0.04949747468305833 2024-08-11 01:11:54,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=840050.0, ans=0.0 2024-08-11 01:12:06,916 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11550, loss[loss=0.1004, beats_loss=0.01257, ecapa_loss=0.0001687, whisper_loss=0.08616, over 23195.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01141, ecapa_loss=0.0002174, whisper_loss=0.09505, over 3899870.70 frames. ], batch size: 92, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:12:34,786 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.08 vs. limit=15.0 2024-08-11 01:12:52,902 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 01:13:01,617 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 01:13:09,170 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 01:13:24,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=840550.0, ans=0.2 2024-08-11 01:13:27,836 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11600, loss[loss=0.09968, beats_loss=0.01491, ecapa_loss=0.0001669, whisper_loss=0.0831, over 19566.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01157, ecapa_loss=0.000216, whisper_loss=0.09447, over 3908512.72 frames. ], batch size: 79, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:13:37,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840650.0, ans=0.1 2024-08-11 01:13:42,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=840750.0, ans=0.0 2024-08-11 01:13:49,454 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2024-08-11 01:13:52,844 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.094e-02 2024-08-11 01:14:11,115 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-11 01:14:18,278 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-08-11 01:14:23,786 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+01 2.786e+01 3.126e+01 3.591e+01 6.008e+01, threshold=6.251e+01, percent-clipped=0.0 2024-08-11 01:14:47,019 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11650, loss[loss=0.08037, beats_loss=0.01359, ecapa_loss=0.000197, whisper_loss=0.06481, over 19812.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01156, ecapa_loss=0.0002166, whisper_loss=0.09403, over 3917339.57 frames. ], batch size: 81, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:14:47,204 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 11 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 01:14:49,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2024-08-11 01:14:49,959 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 40 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 01:14:50,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=841150.0, ans=0.0 2024-08-11 01:15:24,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=841350.0, ans=0.125 2024-08-11 01:16:05,984 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11700, loss[loss=0.09432, beats_loss=0.01225, ecapa_loss=0.000201, whisper_loss=0.08006, over 18005.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01167, ecapa_loss=0.0002159, whisper_loss=0.09398, over 3929684.51 frames. ], batch size: 73, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:16:16,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=841650.0, ans=0.2 2024-08-11 01:16:29,158 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.38 vs. limit=22.5 2024-08-11 01:16:39,834 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.786e+05 2024-08-11 01:16:45,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=841850.0, ans=0.125 2024-08-11 01:16:49,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=841850.0, ans=0.125 2024-08-11 01:16:59,469 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.297e+01 2.883e+01 3.187e+01 3.882e+01 5.856e+01, threshold=6.374e+01, percent-clipped=0.0 2024-08-11 01:17:05,530 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.92 vs. limit=15.0 2024-08-11 01:17:13,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842050.0, ans=0.1 2024-08-11 01:17:16,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=842050.0, ans=0.125 2024-08-11 01:17:23,012 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11750, loss[loss=0.08606, beats_loss=0.01497, ecapa_loss=0.0001656, whisper_loss=0.06943, over 21164.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01175, ecapa_loss=0.0002155, whisper_loss=0.0933, over 3945156.69 frames. ], batch size: 84, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:17:23,216 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 01:17:32,482 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 27 from Vox, 14 fro AS 2024-08-11 01:17:35,172 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 01:17:42,524 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.61 vs. limit=22.5 2024-08-11 01:17:46,410 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 01:18:32,759 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 01:18:40,784 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11800, loss[loss=0.107, beats_loss=0.01339, ecapa_loss=0.0002277, whisper_loss=0.09128, over 22500.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0117, ecapa_loss=0.0002164, whisper_loss=0.09444, over 3935721.42 frames. ], batch size: 92, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:18:45,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=842650.0, ans=0.035 2024-08-11 01:18:50,144 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 01:18:58,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=842750.0, ans=0.0 2024-08-11 01:19:01,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=842750.0, ans=0.125 2024-08-11 01:19:29,560 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.77 vs. limit=10.0 2024-08-11 01:19:35,911 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 01:19:38,034 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.232e+01 2.831e+01 3.248e+01 3.772e+01 8.461e+01, threshold=6.495e+01, percent-clipped=3.0 2024-08-11 01:19:40,669 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 01:19:58,204 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 01:20:02,806 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.40 vs. limit=15.0 2024-08-11 01:20:03,176 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11850, loss[loss=0.09306, beats_loss=0.01545, ecapa_loss=0.000176, whisper_loss=0.07585, over 22147.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01173, ecapa_loss=0.0002181, whisper_loss=0.09402, over 3925253.64 frames. ], batch size: 92, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:20:12,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.83 vs. limit=22.5 2024-08-11 01:20:31,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=843250.0, ans=0.125 2024-08-11 01:20:39,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=843350.0, ans=0.2 2024-08-11 01:20:49,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=843450.0, ans=0.125 2024-08-11 01:20:59,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=843450.0, ans=0.125 2024-08-11 01:20:59,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=843450.0, ans=0.025 2024-08-11 01:21:00,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=843450.0, ans=0.125 2024-08-11 01:21:10,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=843550.0, ans=0.125 2024-08-11 01:21:20,982 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11900, loss[loss=0.1104, beats_loss=0.01251, ecapa_loss=0.0002067, whisper_loss=0.09585, over 16675.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0117, ecapa_loss=0.0002187, whisper_loss=0.09457, over 3960417.96 frames. ], batch size: 64, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:21:30,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=843650.0, ans=0.125 2024-08-11 01:22:00,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=843850.0, ans=0.125 2024-08-11 01:22:02,074 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 01:22:02,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=843850.0, ans=0.0 2024-08-11 01:22:13,034 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.860e+01 3.257e+01 3.543e+01 6.146e+01, threshold=6.513e+01, percent-clipped=0.0 2024-08-11 01:22:16,594 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 01:22:20,065 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 01:22:21,820 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2024-08-11 01:22:23,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=844050.0, ans=0.1 2024-08-11 01:22:32,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=844050.0, ans=0.125 2024-08-11 01:22:34,894 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 11950, loss[loss=0.1001, beats_loss=0.01188, ecapa_loss=0.0002392, whisper_loss=0.08583, over 16639.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01171, ecapa_loss=0.0002189, whisper_loss=0.09407, over 3931354.27 frames. ], batch size: 69, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:22:35,088 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 01:22:59,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=844250.0, ans=0.2 2024-08-11 01:23:14,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=844350.0, ans=0.0 2024-08-11 01:23:16,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=844350.0, ans=0.125 2024-08-11 01:23:21,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=844350.0, ans=0.0 2024-08-11 01:23:26,122 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 01:23:31,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=844450.0, ans=0.0 2024-08-11 01:23:36,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=844450.0, ans=0.1 2024-08-11 01:23:38,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=844550.0, ans=22.5 2024-08-11 01:23:42,117 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 01:23:42,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=844550.0, ans=0.1 2024-08-11 01:23:42,664 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.00 vs. limit=10.0 2024-08-11 01:23:44,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=844550.0, ans=0.05 2024-08-11 01:23:53,700 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12000, loss[loss=0.1228, beats_loss=0.01034, ecapa_loss=0.0001902, whisper_loss=0.1105, over 17922.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01166, ecapa_loss=0.000218, whisper_loss=0.09446, over 3917281.59 frames. ], batch size: 68, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:23:53,700 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 01:24:32,739 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on ASR_libri: loss=0.2603, beats_loss=0, ecapa_loss=0.0006879, whisper_loss=0.2534, over 922467.00 frames. 2024-08-11 01:24:45,042 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.7814, 2.1702, 2.4844, 1.8782, 2.6256, 2.6532, 2.5052, 2.4692], device='cuda:1') 2024-08-11 01:24:50,755 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on SV_voxceleb1: loss=0.005764, beats_loss=0, ecapa_loss=0.0005764, whisper_loss=0, over 939242.00 frames. 2024-08-11 01:26:27,178 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.6333, 1.9739, 1.5824, 1.4593, 1.3008, 1.3051, 1.6675, 1.5423], device='cuda:1') 2024-08-11 01:26:40,321 INFO [train_multi_KD3.py:1149] (1/4) Epoch 6, validation on AT_audioset: loss=0.02599, beats_loss=0.02599, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 01:26:40,324 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 01:26:40,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=844650.0, ans=0.125 2024-08-11 01:26:42,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=844650.0, ans=0.05 2024-08-11 01:27:26,388 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2024-08-11 01:27:35,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.989e+01 3.252e+01 3.842e+01 6.267e+01, threshold=6.505e+01, percent-clipped=0.0 2024-08-11 01:27:47,223 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 01:27:47,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=845050.0, ans=0.0 2024-08-11 01:27:52,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=845050.0, ans=0.125 2024-08-11 01:28:00,053 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12050, loss[loss=0.1097, beats_loss=0.01168, ecapa_loss=0.0002284, whisper_loss=0.09576, over 15270.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01165, ecapa_loss=0.0002194, whisper_loss=0.09378, over 3896815.96 frames. ], batch size: 60, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:28:19,872 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 17 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 01:28:40,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=845350.0, ans=0.0 2024-08-11 01:28:57,890 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 01:29:08,944 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=12.0 2024-08-11 01:29:17,324 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12100, loss[loss=0.1165, beats_loss=0.0116, ecapa_loss=0.0001845, whisper_loss=0.103, over 23283.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01166, ecapa_loss=0.0002195, whisper_loss=0.09403, over 3885807.87 frames. ], batch size: 93, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:29:37,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=845750.0, ans=0.0 2024-08-11 01:29:51,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=845850.0, ans=0.04949747468305833 2024-08-11 01:29:54,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=845850.0, ans=0.2 2024-08-11 01:29:55,328 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 16 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 01:30:10,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.614e+01 2.881e+01 3.224e+01 5.170e+01, threshold=5.763e+01, percent-clipped=0.0 2024-08-11 01:30:28,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=846050.0, ans=0.125 2024-08-11 01:30:30,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=846050.0, ans=0.0 2024-08-11 01:30:32,845 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12150, loss[loss=0.1239, beats_loss=0.01097, ecapa_loss=0.0001674, whisper_loss=0.1112, over 15186.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.0116, ecapa_loss=0.0002203, whisper_loss=0.09349, over 3834498.95 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:30:49,655 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-11 01:30:57,985 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 01:31:21,312 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 01:31:32,096 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 13 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 01:31:32,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=846450.0, ans=0.125 2024-08-11 01:31:49,818 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12200, loss[loss=0.1073, beats_loss=0.01412, ecapa_loss=0.0002054, whisper_loss=0.0911, over 17200.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01156, ecapa_loss=0.000218, whisper_loss=0.09403, over 3856855.38 frames. ], batch size: 70, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:32:02,821 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 01:32:17,273 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 01:32:27,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=846850.0, ans=0.0 2024-08-11 01:32:34,274 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-11 01:32:43,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.785e+01 3.124e+01 3.706e+01 5.181e+01, threshold=6.248e+01, percent-clipped=0.0 2024-08-11 01:32:48,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=846950.0, ans=0.125 2024-08-11 01:33:04,779 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2024-08-11 01:33:08,890 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12250, loss[loss=0.1155, beats_loss=0.009215, ecapa_loss=0.0002844, whisper_loss=0.1034, over 15279.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01156, ecapa_loss=0.0002194, whisper_loss=0.09324, over 3829061.85 frames. ], batch size: 62, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:33:21,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=847150.0, ans=0.125 2024-08-11 01:33:38,146 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-11 01:33:39,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=847350.0, ans=0.07 2024-08-11 01:33:41,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=847350.0, ans=0.1 2024-08-11 01:33:44,344 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 01:33:53,626 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2024-08-11 01:34:28,121 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12300, loss[loss=0.09798, beats_loss=0.01204, ecapa_loss=0.0002296, whisper_loss=0.08364, over 22470.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01157, ecapa_loss=0.0002191, whisper_loss=0.09344, over 3842853.02 frames. ], batch size: 91, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:34:43,279 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.66 vs. limit=6.0 2024-08-11 01:35:12,769 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-11 01:35:17,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=847950.0, ans=0.2 2024-08-11 01:35:24,207 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.835e+01 3.125e+01 3.646e+01 6.261e+01, threshold=6.249e+01, percent-clipped=1.0 2024-08-11 01:35:29,468 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 01:35:32,968 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 01:35:36,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=848050.0, ans=0.0 2024-08-11 01:35:45,306 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 01:35:47,650 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12350, loss[loss=0.08757, beats_loss=0.01357, ecapa_loss=0.0002037, whisper_loss=0.07197, over 23519.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0116, ecapa_loss=0.0002192, whisper_loss=0.0932, over 3840863.13 frames. ], batch size: 94, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:35:58,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=848150.0, ans=0.0 2024-08-11 01:36:01,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=848250.0, ans=0.125 2024-08-11 01:36:32,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=848450.0, ans=0.125 2024-08-11 01:36:37,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=848450.0, ans=0.125 2024-08-11 01:36:39,156 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-11 01:37:02,360 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12400, loss[loss=0.1269, beats_loss=0.01145, ecapa_loss=0.0001833, whisper_loss=0.1137, over 22856.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01158, ecapa_loss=0.0002176, whisper_loss=0.09387, over 3837967.45 frames. ], batch size: 87, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:37:20,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=848750.0, ans=0.125 2024-08-11 01:37:41,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=848850.0, ans=0.125 2024-08-11 01:37:54,921 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.646e+01 2.993e+01 3.533e+01 4.877e+01, threshold=5.986e+01, percent-clipped=0.0 2024-08-11 01:37:56,391 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 01:38:03,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=849050.0, ans=0.125 2024-08-11 01:38:17,162 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12450, loss[loss=0.09849, beats_loss=0.01323, ecapa_loss=0.0001835, whisper_loss=0.08342, over 13606.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01156, ecapa_loss=0.0002166, whisper_loss=0.09373, over 3838574.77 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:38:18,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.59 vs. limit=22.5 2024-08-11 01:38:18,232 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.10 vs. limit=15.0 2024-08-11 01:38:30,942 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 01:38:35,024 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 01:38:58,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=849350.0, ans=0.125 2024-08-11 01:39:15,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=849450.0, ans=0.0 2024-08-11 01:39:26,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=849550.0, ans=0.125 2024-08-11 01:39:32,593 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12500, loss[loss=0.1191, beats_loss=0.009248, ecapa_loss=0.0002321, whisper_loss=0.1076, over 17855.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01149, ecapa_loss=0.0002157, whisper_loss=0.0947, over 3873081.54 frames. ], batch size: 69, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:39:32,777 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 23 from LS+wenet, 40 from Vox, 32 fro AS 2024-08-11 01:40:10,165 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 16 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 01:40:10,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=849850.0, ans=0.125 2024-08-11 01:40:18,152 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-11 01:40:28,763 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.816e+01 3.133e+01 3.791e+01 6.148e+01, threshold=6.266e+01, percent-clipped=1.0 2024-08-11 01:40:34,781 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 01:40:43,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=850050.0, ans=0.125 2024-08-11 01:40:51,599 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12550, loss[loss=0.1047, beats_loss=0.01256, ecapa_loss=0.0002076, whisper_loss=0.09003, over 23125.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.0115, ecapa_loss=0.000217, whisper_loss=0.09511, over 3914341.49 frames. ], batch size: 90, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:41:09,572 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 01:41:19,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=850250.0, ans=0.0 2024-08-11 01:41:29,065 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 01:41:29,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=850350.0, ans=0.1 2024-08-11 01:41:35,981 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2024-08-11 01:41:41,030 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 01:41:43,810 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-08-11 01:42:10,861 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12600, loss[loss=0.1111, beats_loss=0.01018, ecapa_loss=0.0001932, whisper_loss=0.09898, over 23139.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01157, ecapa_loss=0.0002178, whisper_loss=0.09415, over 3891678.46 frames. ], batch size: 91, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:42:22,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=850650.0, ans=0.0 2024-08-11 01:42:29,393 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-11 01:42:29,727 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.332e+05 2024-08-11 01:42:31,470 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 01:42:34,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=850750.0, ans=0.0 2024-08-11 01:42:34,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=850750.0, ans=0.0 2024-08-11 01:42:38,083 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-11 01:42:45,499 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 01:42:49,977 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.236e+05 2024-08-11 01:43:03,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=850950.0, ans=0.125 2024-08-11 01:43:06,250 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.981e+01 3.398e+01 4.026e+01 7.168e+01, threshold=6.796e+01, percent-clipped=1.0 2024-08-11 01:43:12,098 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.14 vs. limit=15.0 2024-08-11 01:43:15,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=851050.0, ans=0.0 2024-08-11 01:43:16,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=851050.0, ans=0.125 2024-08-11 01:43:29,698 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12650, loss[loss=0.08292, beats_loss=0.01013, ecapa_loss=0.0003005, whisper_loss=0.06979, over 13453.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01158, ecapa_loss=0.0002192, whisper_loss=0.09411, over 3904349.54 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:43:32,920 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-11 01:43:51,954 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.08 vs. limit=22.5 2024-08-11 01:43:52,845 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 01:43:59,230 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 01:44:27,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=851450.0, ans=0.0 2024-08-11 01:44:30,282 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 01:44:39,558 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2024-08-11 01:44:44,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=851550.0, ans=0.125 2024-08-11 01:44:45,600 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 01:44:48,255 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12700, loss[loss=0.1078, beats_loss=0.01322, ecapa_loss=0.0001781, whisper_loss=0.09284, over 16401.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01168, ecapa_loss=0.0002166, whisper_loss=0.09359, over 3914801.44 frames. ], batch size: 64, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:44:49,732 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 01:44:51,408 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 01:44:52,809 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-11 01:45:02,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=851750.0, ans=0.0 2024-08-11 01:45:06,837 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-11 01:45:09,494 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 01:45:17,296 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 01:45:34,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=851950.0, ans=0.0 2024-08-11 01:45:40,554 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.734e+01 2.989e+01 3.425e+01 5.621e+01, threshold=5.979e+01, percent-clipped=0.0 2024-08-11 01:45:41,328 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.70 vs. limit=22.5 2024-08-11 01:45:54,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=852050.0, ans=0.2 2024-08-11 01:45:54,553 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.56 vs. limit=15.0 2024-08-11 01:46:04,379 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12750, loss[loss=0.1165, beats_loss=0.0117, ecapa_loss=0.0002291, whisper_loss=0.1025, over 22146.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01164, ecapa_loss=0.0002177, whisper_loss=0.09406, over 3918461.23 frames. ], batch size: 89, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:46:34,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=852350.0, ans=0.125 2024-08-11 01:46:35,556 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 19 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-11 01:46:38,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=852350.0, ans=0.125 2024-08-11 01:46:46,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=852350.0, ans=0.05 2024-08-11 01:46:55,466 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 01:46:59,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852450.0, ans=0.1 2024-08-11 01:46:59,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=852450.0, ans=0.125 2024-08-11 01:47:02,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.05 vs. limit=22.5 2024-08-11 01:47:09,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=852550.0, ans=0.125 2024-08-11 01:47:19,346 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12800, loss[loss=0.0978, beats_loss=0.01226, ecapa_loss=0.0001884, whisper_loss=0.08365, over 18080.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01174, ecapa_loss=0.0002178, whisper_loss=0.09341, over 3921015.98 frames. ], batch size: 70, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:47:28,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=852650.0, ans=0.0 2024-08-11 01:47:34,083 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 01:47:40,698 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=15.0 2024-08-11 01:47:52,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=852850.0, ans=0.0 2024-08-11 01:47:54,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=852850.0, ans=0.0 2024-08-11 01:48:04,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=852950.0, ans=0.0 2024-08-11 01:48:09,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.804e+01 3.213e+01 3.707e+01 6.106e+01, threshold=6.425e+01, percent-clipped=1.0 2024-08-11 01:48:16,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853050.0, ans=0.1 2024-08-11 01:48:21,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=853050.0, ans=0.0 2024-08-11 01:48:30,670 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12850, loss[loss=0.1087, beats_loss=0.01245, ecapa_loss=0.0001794, whisper_loss=0.0945, over 17269.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.0117, ecapa_loss=0.0002191, whisper_loss=0.09325, over 3887469.93 frames. ], batch size: 66, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:48:43,668 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2024-08-11 01:48:44,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=853250.0, ans=0.0 2024-08-11 01:48:54,544 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 01:49:20,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=853450.0, ans=0.125 2024-08-11 01:49:25,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=853450.0, ans=0.125 2024-08-11 01:49:30,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=853550.0, ans=0.2 2024-08-11 01:49:40,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=853650.0, ans=0.2 2024-08-11 01:49:41,584 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12900, loss[loss=0.08758, beats_loss=0.01076, ecapa_loss=0.0002529, whisper_loss=0.07429, over 20741.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01161, ecapa_loss=0.000219, whisper_loss=0.0939, over 3875730.83 frames. ], batch size: 87, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:49:45,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=853650.0, ans=0.125 2024-08-11 01:49:49,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=853650.0, ans=0.125 2024-08-11 01:49:59,932 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 01:50:01,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=853750.0, ans=0.125 2024-08-11 01:50:02,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=853750.0, ans=0.5 2024-08-11 01:50:05,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853750.0, ans=0.1 2024-08-11 01:50:08,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=853850.0, ans=0.0 2024-08-11 01:50:15,531 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 01:50:32,358 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.643e+01 2.887e+01 3.353e+01 5.409e+01, threshold=5.774e+01, percent-clipped=0.0 2024-08-11 01:50:38,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=854050.0, ans=0.125 2024-08-11 01:50:40,491 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.85 vs. limit=10.0 2024-08-11 01:50:46,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=854050.0, ans=0.125 2024-08-11 01:50:47,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=854050.0, ans=0.2 2024-08-11 01:50:49,430 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 31 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-11 01:50:52,452 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-11 01:50:55,018 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 12950, loss[loss=0.1077, beats_loss=0.01143, ecapa_loss=0.0002179, whisper_loss=0.09404, over 21507.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01161, ecapa_loss=0.0002182, whisper_loss=0.09323, over 3864815.22 frames. ], batch size: 86, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:50:58,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854150.0, ans=0.1 2024-08-11 01:51:35,984 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 01:51:40,179 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.98 vs. limit=15.0 2024-08-11 01:51:44,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=854450.0, ans=0.07 2024-08-11 01:52:07,411 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 01:52:11,918 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13000, loss[loss=0.09051, beats_loss=0.01234, ecapa_loss=0.0002332, whisper_loss=0.07584, over 20688.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01154, ecapa_loss=0.0002191, whisper_loss=0.09422, over 3907301.72 frames. ], batch size: 87, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:52:19,080 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 01:52:28,983 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.03 vs. limit=10.0 2024-08-11 01:52:35,628 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 01:53:00,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=854950.0, ans=0.125 2024-08-11 01:53:06,135 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.697e+01 3.039e+01 3.659e+01 7.134e+01, threshold=6.079e+01, percent-clipped=1.0 2024-08-11 01:53:08,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.70 vs. limit=12.0 2024-08-11 01:53:15,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=855050.0, ans=0.125 2024-08-11 01:53:23,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=855050.0, ans=0.0 2024-08-11 01:53:29,962 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13050, loss[loss=0.1132, beats_loss=0.01301, ecapa_loss=0.0001822, whisper_loss=0.0984, over 21868.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01162, ecapa_loss=0.0002186, whisper_loss=0.09351, over 3910878.09 frames. ], batch size: 91, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:53:31,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=855150.0, ans=0.1 2024-08-11 01:53:36,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=855150.0, ans=0.125 2024-08-11 01:53:40,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=855150.0, ans=0.0 2024-08-11 01:53:43,607 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-11 01:53:45,108 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 01:53:53,374 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 01:53:56,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=855250.0, ans=0.0 2024-08-11 01:54:04,712 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=12.0 2024-08-11 01:54:18,710 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-11 01:54:32,633 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 01:54:38,977 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 01:54:47,751 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13100, loss[loss=0.08902, beats_loss=0.01299, ecapa_loss=0.0001805, whisper_loss=0.07423, over 14962.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01161, ecapa_loss=0.0002176, whisper_loss=0.0935, over 3891038.09 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:54:50,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=855650.0, ans=0.0 2024-08-11 01:54:51,608 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 01:55:01,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=855750.0, ans=0.0 2024-08-11 01:55:03,832 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-11 01:55:10,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=855750.0, ans=0.125 2024-08-11 01:55:12,413 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2024-08-11 01:55:14,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=855750.0, ans=0.125 2024-08-11 01:55:39,870 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2024-08-11 01:55:44,246 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.870e+01 3.154e+01 3.850e+01 5.715e+01, threshold=6.308e+01, percent-clipped=0.0 2024-08-11 01:56:01,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=856050.0, ans=0.125 2024-08-11 01:56:08,181 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13150, loss[loss=0.07846, beats_loss=0.01563, ecapa_loss=0.0001927, whisper_loss=0.0609, over 21750.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01167, ecapa_loss=0.0002162, whisper_loss=0.09342, over 3880802.30 frames. ], batch size: 91, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:56:13,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=856150.0, ans=0.2 2024-08-11 01:56:41,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=856350.0, ans=0.1 2024-08-11 01:56:47,009 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 01:56:50,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=856350.0, ans=0.0 2024-08-11 01:57:04,111 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 29 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 01:57:17,610 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 01:57:19,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=856550.0, ans=0.0 2024-08-11 01:57:25,603 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13200, loss[loss=0.0981, beats_loss=0.01254, ecapa_loss=0.0001919, whisper_loss=0.08364, over 15360.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01164, ecapa_loss=0.0002167, whisper_loss=0.09421, over 3855199.68 frames. ], batch size: 60, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:57:29,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=856650.0, ans=0.1 2024-08-11 01:57:31,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=856650.0, ans=0.125 2024-08-11 01:57:46,520 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-11 01:57:46,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=856750.0, ans=0.0 2024-08-11 01:57:49,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=856750.0, ans=0.125 2024-08-11 01:58:17,175 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.805e+01 3.191e+01 3.827e+01 5.209e+01, threshold=6.381e+01, percent-clipped=0.0 2024-08-11 01:58:30,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=857050.0, ans=0.0 2024-08-11 01:58:38,848 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13250, loss[loss=0.1173, beats_loss=0.01112, ecapa_loss=0.0001877, whisper_loss=0.1043, over 18434.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01154, ecapa_loss=0.0002179, whisper_loss=0.09456, over 3827421.77 frames. ], batch size: 69, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:58:59,246 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-11 01:58:59,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=857250.0, ans=0.125 2024-08-11 01:59:08,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=857350.0, ans=0.125 2024-08-11 01:59:09,271 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 01:59:23,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=857450.0, ans=0.07 2024-08-11 01:59:28,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=857450.0, ans=0.1 2024-08-11 01:59:47,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=857650.0, ans=0.125 2024-08-11 01:59:49,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13300, loss[loss=0.1299, beats_loss=0.009967, ecapa_loss=0.0002332, whisper_loss=0.1176, over 22342.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.0115, ecapa_loss=0.0002171, whisper_loss=0.09526, over 3853602.38 frames. ], batch size: 89, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 02:00:02,495 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 02:00:03,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=857750.0, ans=0.2 2024-08-11 02:00:15,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=857750.0, ans=0.125 2024-08-11 02:00:27,293 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-11 02:00:32,452 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-11 02:00:37,639 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.722e+01 2.995e+01 3.352e+01 6.535e+01, threshold=5.989e+01, percent-clipped=1.0 2024-08-11 02:00:48,813 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 27 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-11 02:00:57,488 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13350, loss[loss=0.1188, beats_loss=0.009921, ecapa_loss=0.000262, whisper_loss=0.1063, over 19457.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01159, ecapa_loss=0.0002182, whisper_loss=0.09456, over 3864430.36 frames. ], batch size: 78, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 02:00:58,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=858150.0, ans=0.1 2024-08-11 02:00:59,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=858150.0, ans=0.0 2024-08-11 02:01:08,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=858150.0, ans=0.0 2024-08-11 02:01:10,722 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 02:01:11,165 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.19 vs. limit=12.0 2024-08-11 02:01:11,859 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-11 02:01:29,672 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 02:01:29,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=858350.0, ans=0.0 2024-08-11 02:01:30,993 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 02:01:31,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=858350.0, ans=0.1 2024-08-11 02:01:45,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=858450.0, ans=0.1 2024-08-11 02:01:46,786 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 02:01:52,336 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 02:01:55,426 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2024-08-11 02:02:04,943 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13400, loss[loss=0.09964, beats_loss=0.01072, ecapa_loss=0.0003279, whisper_loss=0.08564, over 14088.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01163, ecapa_loss=0.0002183, whisper_loss=0.09403, over 3850879.65 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 02:02:11,178 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-11 02:02:30,673 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 02:02:35,925 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-11 02:02:44,441 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-08-11 02:02:51,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.837e+01 3.208e+01 3.826e+01 8.458e+01, threshold=6.417e+01, percent-clipped=4.0 2024-08-11 02:02:55,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=858950.0, ans=10.0 2024-08-11 02:03:04,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=859050.0, ans=0.1 2024-08-11 02:03:11,073 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13450, loss[loss=0.125, beats_loss=0.01142, ecapa_loss=0.0002165, whisper_loss=0.1114, over 21433.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01161, ecapa_loss=0.0002181, whisper_loss=0.09371, over 3838144.80 frames. ], batch size: 86, lr: 1.00e-02, grad_scale: 140737488355328.0 2024-08-11 02:03:39,752 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=15.0 2024-08-11 02:03:46,217 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.55 vs. limit=10.0 2024-08-11 02:03:47,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=859350.0, ans=0.125 2024-08-11 02:03:49,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=859450.0, ans=0.1 2024-08-11 02:03:54,581 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 02:03:56,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=859450.0, ans=0.0 2024-08-11 02:04:12,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=859550.0, ans=0.1 2024-08-11 02:04:18,180 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13500, loss[loss=0.1226, beats_loss=0.008977, ecapa_loss=0.0001929, whisper_loss=0.1117, over 20314.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01156, ecapa_loss=0.0002166, whisper_loss=0.0937, over 3848262.01 frames. ], batch size: 73, lr: 1.00e-02, grad_scale: 140737488355328.0 2024-08-11 02:04:21,206 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 02:04:21,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=859650.0, ans=0.0 2024-08-11 02:04:24,065 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 12 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 02:04:48,537 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2024-08-11 02:04:55,820 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 02:05:04,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.793e+01 3.249e+01 3.860e+01 6.225e+01, threshold=6.498e+01, percent-clipped=0.0 2024-08-11 02:05:11,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=860050.0, ans=0.0 2024-08-11 02:05:24,745 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13550, loss[loss=0.09766, beats_loss=0.01183, ecapa_loss=0.0002168, whisper_loss=0.08366, over 22149.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01164, ecapa_loss=0.0002148, whisper_loss=0.09353, over 3852773.84 frames. ], batch size: 88, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:05:25,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=860150.0, ans=0.125 2024-08-11 02:05:26,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=860150.0, ans=0.125 2024-08-11 02:05:31,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=860150.0, ans=0.0 2024-08-11 02:05:31,259 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.87 vs. limit=10.0 2024-08-11 02:05:55,891 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2024-08-11 02:06:24,199 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-11 02:06:24,615 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-11 02:06:34,087 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13600, loss[loss=0.0904, beats_loss=0.01386, ecapa_loss=0.0002085, whisper_loss=0.07445, over 20072.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01158, ecapa_loss=0.0002143, whisper_loss=0.09361, over 3853020.17 frames. ], batch size: 85, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:07:07,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=860850.0, ans=0.125 2024-08-11 02:07:23,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.998e+01 3.369e+01 4.005e+01 6.707e+01, threshold=6.738e+01, percent-clipped=1.0 2024-08-11 02:07:27,216 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 02:07:32,141 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.532e-02 2024-08-11 02:07:40,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=861050.0, ans=0.125 2024-08-11 02:07:44,135 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13650, loss[loss=0.1148, beats_loss=0.01104, ecapa_loss=0.0001905, whisper_loss=0.1018, over 17450.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0116, ecapa_loss=0.0002144, whisper_loss=0.09392, over 3847208.11 frames. ], batch size: 67, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:07:50,142 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-11 02:08:31,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=861450.0, ans=0.05 2024-08-11 02:08:46,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=861550.0, ans=0.125 2024-08-11 02:08:54,115 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13700, loss[loss=0.09307, beats_loss=0.008814, ecapa_loss=0.0002466, whisper_loss=0.08179, over 22459.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0117, ecapa_loss=0.0002139, whisper_loss=0.09273, over 3830214.72 frames. ], batch size: 92, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:09:23,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861850.0, ans=0.1 2024-08-11 02:09:29,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=861850.0, ans=0.0 2024-08-11 02:09:44,015 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.742e+01 3.072e+01 3.573e+01 1.415e+02, threshold=6.145e+01, percent-clipped=1.0 2024-08-11 02:09:48,164 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 02:10:05,030 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13750, loss[loss=0.112, beats_loss=0.01199, ecapa_loss=0.0001953, whisper_loss=0.09801, over 21859.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01171, ecapa_loss=0.0002139, whisper_loss=0.09277, over 3835256.75 frames. ], batch size: 83, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:10:09,567 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-11 02:10:13,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=862150.0, ans=0.125 2024-08-11 02:10:20,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=862250.0, ans=0.2 2024-08-11 02:10:26,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=862250.0, ans=0.025 2024-08-11 02:10:47,799 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-11 02:10:54,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=862450.0, ans=0.125 2024-08-11 02:10:57,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=862450.0, ans=0.125 2024-08-11 02:11:04,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=862550.0, ans=0.0 2024-08-11 02:11:07,173 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 19 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-11 02:11:07,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=862550.0, ans=0.125 2024-08-11 02:11:10,239 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.50 vs. limit=12.0 2024-08-11 02:11:14,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.22 vs. limit=15.0 2024-08-11 02:11:14,650 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13800, loss[loss=0.1167, beats_loss=0.01174, ecapa_loss=0.0001965, whisper_loss=0.103, over 18837.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.0117, ecapa_loss=0.0002127, whisper_loss=0.09347, over 3850285.95 frames. ], batch size: 76, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:11:34,048 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 02:11:52,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862850.0, ans=0.1 2024-08-11 02:12:04,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.614e+01 2.961e+01 3.435e+01 1.383e+02, threshold=5.922e+01, percent-clipped=1.0 2024-08-11 02:12:07,947 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 02:12:14,344 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.61 vs. limit=22.5 2024-08-11 02:12:26,438 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13850, loss[loss=0.08737, beats_loss=0.01179, ecapa_loss=0.0002106, whisper_loss=0.07348, over 21627.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01163, ecapa_loss=0.0002142, whisper_loss=0.09375, over 3876905.71 frames. ], batch size: 92, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:12:35,376 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 02:12:39,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=863250.0, ans=0.125 2024-08-11 02:12:49,428 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 02:13:00,576 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 30 from LS+wenet, 13 from Vox, 16 fro AS 2024-08-11 02:13:03,908 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.18 vs. limit=10.0 2024-08-11 02:13:06,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=863350.0, ans=0.125 2024-08-11 02:13:28,022 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.99 vs. limit=10.0 2024-08-11 02:13:32,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=863550.0, ans=0.1 2024-08-11 02:13:36,766 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13900, loss[loss=0.1268, beats_loss=0.009693, ecapa_loss=0.0002363, whisper_loss=0.1147, over 22642.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01156, ecapa_loss=0.0002147, whisper_loss=0.09481, over 3884912.94 frames. ], batch size: 93, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:14:01,514 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.36 vs. limit=10.0 2024-08-11 02:14:02,130 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 02:14:06,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=863850.0, ans=0.125 2024-08-11 02:14:10,906 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 02:14:23,047 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 2.760e+01 3.035e+01 3.739e+01 6.215e+01, threshold=6.069e+01, percent-clipped=1.0 2024-08-11 02:14:42,274 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 13950, loss[loss=0.07962, beats_loss=0.01294, ecapa_loss=0.0002007, whisper_loss=0.06467, over 17919.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01149, ecapa_loss=0.0002144, whisper_loss=0.09472, over 3883189.54 frames. ], batch size: 73, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:14:49,327 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-08-11 02:14:54,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=864250.0, ans=0.125 2024-08-11 02:14:57,791 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 02:15:03,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=864250.0, ans=0.2 2024-08-11 02:15:06,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=864250.0, ans=0.1 2024-08-11 02:15:07,358 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-11 02:15:13,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=864350.0, ans=0.125 2024-08-11 02:15:19,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=864350.0, ans=0.125 2024-08-11 02:15:34,516 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 02:15:35,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=864550.0, ans=0.025 2024-08-11 02:15:47,377 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 14000, loss[loss=0.1185, beats_loss=0.009289, ecapa_loss=0.0002204, whisper_loss=0.1071, over 18201.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01163, ecapa_loss=0.0002126, whisper_loss=0.09369, over 3877935.76 frames. ], batch size: 70, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:15:47,837 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 02:15:58,324 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 02:16:20,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=864850.0, ans=0.0 2024-08-11 02:16:29,602 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 02:16:33,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.879e+01 3.227e+01 3.709e+01 6.302e+01, threshold=6.454e+01, percent-clipped=1.0 2024-08-11 02:16:52,856 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 14050, loss[loss=0.09207, beats_loss=0.01059, ecapa_loss=0.0002017, whisper_loss=0.07946, over 14740.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01157, ecapa_loss=0.0002147, whisper_loss=0.09417, over 3880313.40 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:16:53,066 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 02:17:06,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=865250.0, ans=0.1 2024-08-11 02:17:11,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=865250.0, ans=0.025 2024-08-11 02:17:16,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=865250.0, ans=0.125 2024-08-11 02:17:17,466 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 02:17:29,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=865350.0, ans=0.125 2024-08-11 02:17:33,139 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 02:17:39,229 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.41 vs. limit=15.0 2024-08-11 02:17:57,950 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 14100, loss[loss=0.1166, beats_loss=0.01087, ecapa_loss=0.0002368, whisper_loss=0.1034, over 21424.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01165, ecapa_loss=0.0002131, whisper_loss=0.09357, over 3846721.12 frames. ], batch size: 85, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:17:58,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=865650.0, ans=0.1 2024-08-11 02:18:12,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=865750.0, ans=0.0 2024-08-11 02:18:38,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=865950.0, ans=0.0 2024-08-11 02:18:44,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.713e+01 2.992e+01 3.543e+01 5.369e+01, threshold=5.983e+01, percent-clipped=0.0 2024-08-11 02:19:04,979 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 14150, loss[loss=0.1132, beats_loss=0.01304, ecapa_loss=0.0001803, whisper_loss=0.09835, over 20054.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01163, ecapa_loss=0.000213, whisper_loss=0.09345, over 3839765.86 frames. ], batch size: 79, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:19:17,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=866250.0, ans=0.2 2024-08-11 02:19:28,477 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 02:19:34,242 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.520e-01 2024-08-11 02:19:39,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=866350.0, ans=0.1 2024-08-11 02:19:45,659 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 12 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 02:20:00,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=866550.0, ans=0.09899494936611666 2024-08-11 02:20:10,540 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 14200, loss[loss=0.09841, beats_loss=0.01448, ecapa_loss=0.0001935, whisper_loss=0.082, over 23370.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01164, ecapa_loss=0.0002124, whisper_loss=0.09302, over 3830798.51 frames. ], batch size: 96, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:20:25,301 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 02:20:27,720 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 19 from LS+wenet, 25 from Vox, 52 fro AS 2024-08-11 02:20:28,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2024-08-11 02:20:34,890 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2024-08-11 02:20:41,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=866850.0, ans=0.0 2024-08-11 02:20:54,253 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-11 02:20:57,898 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.819e+01 3.173e+01 3.823e+01 7.553e+01, threshold=6.347e+01, percent-clipped=1.0 2024-08-11 02:21:19,238 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 14250, loss[loss=0.1316, beats_loss=0.009675, ecapa_loss=0.0002218, whisper_loss=0.1197, over 23151.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01159, ecapa_loss=0.0002117, whisper_loss=0.09391, over 3867137.60 frames. ], batch size: 91, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:21:25,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867150.0, ans=0.1 2024-08-11 02:21:32,659 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 02:21:43,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=867250.0, ans=0.2 2024-08-11 02:21:55,930 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 02:22:09,044 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 02:22:11,476 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 02:22:26,977 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 14300, loss[loss=0.08666, beats_loss=0.0135, ecapa_loss=0.0001998, whisper_loss=0.07117, over 22336.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01163, ecapa_loss=0.0002124, whisper_loss=0.09316, over 3884332.40 frames. ], batch size: 93, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:22:27,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=867650.0, ans=0.125 2024-08-11 02:22:38,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=867650.0, ans=0.125 2024-08-11 02:22:39,228 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-11 02:22:43,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=867750.0, ans=0.1 2024-08-11 02:23:07,290 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 02:23:11,977 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.633e+01 2.947e+01 3.319e+01 6.322e+01, threshold=5.893e+01, percent-clipped=0.0 2024-08-11 02:23:31,132 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 14350, loss[loss=0.1176, beats_loss=0.01092, ecapa_loss=0.0002448, whisper_loss=0.1042, over 22167.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0116, ecapa_loss=0.0002133, whisper_loss=0.09281, over 3884536.21 frames. ], batch size: 91, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:23:34,092 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 13 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 02:23:35,696 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-11 02:23:35,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=868150.0, ans=0.125 2024-08-11 02:23:36,983 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 02:23:44,605 INFO [train_multi_KD3.py:844] (1/4) A total of 98 cuts. 29 from LS+wenet, 16 from Vox, 53 fro AS 2024-08-11 02:24:04,357 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 02:24:08,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=868350.0, ans=10.0 2024-08-11 02:24:26,575 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-11 02:24:28,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=868550.0, ans=0.5 2024-08-11 02:24:35,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=868650.0, ans=0.125 2024-08-11 02:24:35,731 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 14400, loss[loss=0.1296, beats_loss=0.009869, ecapa_loss=0.0002956, whisper_loss=0.1168, over 20385.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0115, ecapa_loss=0.0002156, whisper_loss=0.09372, over 3881746.77 frames. ], batch size: 87, lr: 9.99e-03, grad_scale: 281474976710656.0 2024-08-11 02:25:11,333 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 02:25:21,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.691e+01 3.158e+01 3.511e+01 8.025e+01, threshold=6.317e+01, percent-clipped=1.0 2024-08-11 02:25:28,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=869050.0, ans=0.0 2024-08-11 02:25:40,683 INFO [train_multi_KD3.py:1116] (1/4) Epoch 6, batch 14450, loss[loss=0.1089, beats_loss=0.01081, ecapa_loss=0.0002347, whisper_loss=0.09573, over 22508.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01153, ecapa_loss=0.0002166, whisper_loss=0.09311, over 3844692.89 frames. ], batch size: 89, lr: 9.99e-03, grad_scale: 281474976710656.0 2024-08-11 02:25:41,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=869150.0, ans=0.125 2024-08-11 02:25:51,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=869150.0, ans=0.125 2024-08-11 02:25:55,811 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 14 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 02:25:58,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=869250.0, ans=0.0 2024-08-11 02:26:00,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=869250.0, ans=0.125 2024-08-11 02:26:13,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=869350.0, ans=0.125 2024-08-11 02:26:15,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=869350.0, ans=0.125 2024-08-11 02:26:17,315 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 02:26:30,595 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 37 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 02:27:16,265 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 0, loss[loss=0.09769, beats_loss=0.01183, ecapa_loss=0.0002246, whisper_loss=0.08362, over 15630.00 frames. ], tot_loss[loss=0.09769, beats_loss=0.01183, ecapa_loss=0.0002246, whisper_loss=0.08362, over 15630.00 frames. ], batch size: 62, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:27:16,265 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 02:28:00,284 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on ASR_libri: loss=0.2587, beats_loss=0, ecapa_loss=0.0006864, whisper_loss=0.2518, over 922467.00 frames. 2024-08-11 02:28:18,579 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on SV_voxceleb1: loss=0.00579, beats_loss=0, ecapa_loss=0.000579, whisper_loss=0, over 939242.00 frames. 2024-08-11 02:28:42,748 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3468, 3.9533, 3.7817, 3.9168], device='cuda:1') 2024-08-11 02:30:27,689 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on AT_audioset: loss=0.02579, beats_loss=0.02579, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 02:30:27,692 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 02:30:32,618 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 02:30:58,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=869690.0, ans=0.125 2024-08-11 02:31:03,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=869690.0, ans=0.125 2024-08-11 02:32:06,385 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 02:32:35,946 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+01 2.976e+01 3.314e+01 3.996e+01 6.220e+01, threshold=6.628e+01, percent-clipped=0.0 2024-08-11 02:32:38,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=869990.0, ans=0.125 2024-08-11 02:32:49,087 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 02:33:01,371 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-11 02:33:11,181 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 50, loss[loss=0.09496, beats_loss=0.01183, ecapa_loss=0.0002606, whisper_loss=0.08052, over 17439.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01113, ecapa_loss=0.0002231, whisper_loss=0.0964, over 912537.03 frames. ], batch size: 72, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:33:12,321 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 02:33:47,143 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 29 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 02:33:51,781 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-11 02:33:55,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=870190.0, ans=0.5 2024-08-11 02:34:12,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870190.0, ans=0.1 2024-08-11 02:34:28,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=870290.0, ans=0.125 2024-08-11 02:35:49,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=870490.0, ans=0.0 2024-08-11 02:35:53,072 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 02:36:18,230 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 100, loss[loss=0.08052, beats_loss=0.01313, ecapa_loss=0.000248, whisper_loss=0.0649, over 18555.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01121, ecapa_loss=0.0002192, whisper_loss=0.09411, over 1578891.66 frames. ], batch size: 82, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:36:44,764 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.85 vs. limit=22.5 2024-08-11 02:37:01,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=870690.0, ans=0.125 2024-08-11 02:37:14,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=870690.0, ans=0.125 2024-08-11 02:37:32,991 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 02:37:36,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=870790.0, ans=0.1 2024-08-11 02:37:52,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=870790.0, ans=0.2 2024-08-11 02:37:57,451 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2024-08-11 02:38:22,228 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-11 02:38:32,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.551e+01 3.124e+01 3.380e+01 3.805e+01 6.032e+01, threshold=6.760e+01, percent-clipped=0.0 2024-08-11 02:38:46,156 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 02:38:49,759 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 150, loss[loss=0.09251, beats_loss=0.01291, ecapa_loss=0.00021, whisper_loss=0.0775, over 19304.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01098, ecapa_loss=0.0002173, whisper_loss=0.09425, over 2067774.27 frames. ], batch size: 79, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:39:02,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2024-08-11 02:39:08,446 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 18 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 02:39:28,860 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.31 vs. limit=12.0 2024-08-11 02:39:55,458 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 27 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-11 02:40:02,014 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 02:40:03,239 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 02:40:13,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=871490.0, ans=0.1 2024-08-11 02:40:16,139 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 200, loss[loss=0.1051, beats_loss=0.01248, ecapa_loss=0.0002194, whisper_loss=0.09039, over 21499.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01109, ecapa_loss=0.0002163, whisper_loss=0.094, over 2458101.92 frames. ], batch size: 87, lr: 9.35e-03, grad_scale: 281474976710656.0 2024-08-11 02:40:16,735 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 02:40:17,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=871590.0, ans=0.0 2024-08-11 02:40:58,263 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-11 02:40:59,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=871790.0, ans=0.125 2024-08-11 02:41:09,801 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=22.5 2024-08-11 02:41:21,793 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+01 2.791e+01 3.109e+01 3.398e+01 1.022e+02, threshold=6.218e+01, percent-clipped=1.0 2024-08-11 02:41:28,953 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-11 02:41:35,590 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 250, loss[loss=0.1028, beats_loss=0.01168, ecapa_loss=0.0002082, whisper_loss=0.08902, over 21160.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0111, ecapa_loss=0.0002159, whisper_loss=0.09446, over 2741648.57 frames. ], batch size: 84, lr: 9.35e-03, grad_scale: 281474976710656.0 2024-08-11 02:42:03,916 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-11 02:42:08,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=872290.0, ans=0.0 2024-08-11 02:42:11,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=872290.0, ans=0.0 2024-08-11 02:42:12,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=872290.0, ans=0.0 2024-08-11 02:42:21,614 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 02:42:23,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=872290.0, ans=0.2 2024-08-11 02:42:29,790 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 02:42:43,396 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 02:42:52,028 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-11 02:42:57,952 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 300, loss[loss=0.09833, beats_loss=0.01142, ecapa_loss=0.0001795, whisper_loss=0.08512, over 14084.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01106, ecapa_loss=0.0002151, whisper_loss=0.09348, over 2944738.21 frames. ], batch size: 53, lr: 9.35e-03, grad_scale: 281474976710656.0 2024-08-11 02:43:34,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=872790.0, ans=0.1 2024-08-11 02:44:01,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=872990.0, ans=0.1 2024-08-11 02:44:02,253 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.643e+01 2.910e+01 3.334e+01 5.693e+01, threshold=5.820e+01, percent-clipped=0.0 2024-08-11 02:44:06,816 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 02:44:11,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=872990.0, ans=0.125 2024-08-11 02:44:11,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=872990.0, ans=0.2 2024-08-11 02:44:15,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 350, loss[loss=0.09413, beats_loss=0.01038, ecapa_loss=0.0002135, whisper_loss=0.08161, over 14567.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01112, ecapa_loss=0.0002151, whisper_loss=0.09288, over 3087277.37 frames. ], batch size: 56, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:44:16,058 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 02:44:24,556 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2024-08-11 02:44:39,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2024-08-11 02:44:58,186 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-11 02:44:58,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=873290.0, ans=0.0 2024-08-11 02:45:04,517 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.77 vs. limit=15.0 2024-08-11 02:45:07,141 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 02:45:07,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=873390.0, ans=0.2 2024-08-11 02:45:09,272 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2024-08-11 02:45:11,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873390.0, ans=0.1 2024-08-11 02:45:15,198 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.33 vs. limit=22.5 2024-08-11 02:45:17,699 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 02:45:32,689 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 400, loss[loss=0.1011, beats_loss=0.009428, ecapa_loss=0.0002379, whisper_loss=0.08928, over 18871.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01118, ecapa_loss=0.0002118, whisper_loss=0.0927, over 3256299.18 frames. ], batch size: 73, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:45:42,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=873590.0, ans=0.125 2024-08-11 02:45:46,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=873590.0, ans=0.125 2024-08-11 02:45:47,922 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.76 vs. limit=10.0 2024-08-11 02:45:48,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=873690.0, ans=0.125 2024-08-11 02:45:52,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=873690.0, ans=0.125 2024-08-11 02:45:52,324 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-08-11 02:45:57,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=873690.0, ans=0.125 2024-08-11 02:46:14,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-11 02:46:21,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=873890.0, ans=0.125 2024-08-11 02:46:31,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=873890.0, ans=0.1 2024-08-11 02:46:35,423 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.580e+01 2.895e+01 3.398e+01 1.445e+02, threshold=5.790e+01, percent-clipped=1.0 2024-08-11 02:46:38,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=873990.0, ans=0.125 2024-08-11 02:46:48,713 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 450, loss[loss=0.09829, beats_loss=0.01052, ecapa_loss=0.0002419, whisper_loss=0.08534, over 20044.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01125, ecapa_loss=0.0002113, whisper_loss=0.09239, over 3403869.14 frames. ], batch size: 82, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:47:14,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=874190.0, ans=0.0 2024-08-11 02:47:17,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=874290.0, ans=0.0 2024-08-11 02:47:18,446 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 29 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 02:47:23,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=874290.0, ans=0.0 2024-08-11 02:47:35,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2024-08-11 02:47:36,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=874390.0, ans=0.125 2024-08-11 02:47:43,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=874390.0, ans=0.0 2024-08-11 02:47:51,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=874490.0, ans=0.125 2024-08-11 02:48:00,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=874490.0, ans=0.125 2024-08-11 02:48:02,670 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 500, loss[loss=0.1076, beats_loss=0.01113, ecapa_loss=0.0001892, whisper_loss=0.09458, over 18870.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01125, ecapa_loss=0.0002106, whisper_loss=0.09274, over 3516690.17 frames. ], batch size: 74, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:48:07,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=874590.0, ans=0.0 2024-08-11 02:48:18,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=874690.0, ans=0.0 2024-08-11 02:48:24,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=874690.0, ans=0.025 2024-08-11 02:48:28,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=874690.0, ans=0.2 2024-08-11 02:48:39,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=874790.0, ans=0.1 2024-08-11 02:48:50,699 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 25 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 02:48:58,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.274e+01 2.783e+01 3.369e+01 3.762e+01 6.753e+01, threshold=6.739e+01, percent-clipped=3.0 2024-08-11 02:48:58,669 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 02:48:58,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=874990.0, ans=0.125 2024-08-11 02:49:06,670 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 31 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 02:49:10,329 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 550, loss[loss=0.1209, beats_loss=0.01147, ecapa_loss=0.0001728, whisper_loss=0.1077, over 19370.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01118, ecapa_loss=0.0002099, whisper_loss=0.09346, over 3578149.82 frames. ], batch size: 74, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:49:16,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=875090.0, ans=0.0 2024-08-11 02:49:23,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=875190.0, ans=0.2 2024-08-11 02:49:25,023 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 02:49:35,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=875290.0, ans=0.125 2024-08-11 02:49:36,927 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 02:49:39,201 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-11 02:49:41,627 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 02:49:47,105 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-11 02:50:00,820 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2024-08-11 02:50:05,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=875490.0, ans=0.0 2024-08-11 02:50:10,321 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 02:50:15,342 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 600, loss[loss=0.1148, beats_loss=0.01114, ecapa_loss=0.0002014, whisper_loss=0.1017, over 17043.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01126, ecapa_loss=0.0002088, whisper_loss=0.09361, over 3645768.45 frames. ], batch size: 66, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:50:18,458 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 02:50:23,085 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.59 vs. limit=8.0 2024-08-11 02:50:35,195 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 02:50:35,458 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=7.751e-03 2024-08-11 02:51:09,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.703e+01 3.008e+01 3.347e+01 4.794e+01, threshold=6.016e+01, percent-clipped=0.0 2024-08-11 02:51:14,424 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 02:51:20,951 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 650, loss[loss=0.09187, beats_loss=0.01043, ecapa_loss=0.0002888, whisper_loss=0.07855, over 15819.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01125, ecapa_loss=0.0002073, whisper_loss=0.09366, over 3648644.27 frames. ], batch size: 70, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:51:31,428 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 02:51:35,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=876190.0, ans=0.0 2024-08-11 02:51:56,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=876290.0, ans=0.125 2024-08-11 02:51:59,570 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2024-08-11 02:52:13,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=876490.0, ans=0.0 2024-08-11 02:52:14,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=876490.0, ans=0.125 2024-08-11 02:52:16,272 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-11 02:52:26,204 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 700, loss[loss=0.08982, beats_loss=0.01265, ecapa_loss=0.0002359, whisper_loss=0.0748, over 21783.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01129, ecapa_loss=0.0002063, whisper_loss=0.09332, over 3664455.31 frames. ], batch size: 93, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:52:34,559 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 02:52:36,978 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 02:52:37,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=876590.0, ans=0.125 2024-08-11 02:52:42,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=876690.0, ans=0.125 2024-08-11 02:52:43,400 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 02:53:14,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876890.0, ans=0.1 2024-08-11 02:53:18,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=876990.0, ans=0.125 2024-08-11 02:53:19,447 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.856e+01 3.234e+01 3.790e+01 5.945e+01, threshold=6.469e+01, percent-clipped=0.0 2024-08-11 02:53:19,679 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 02:53:31,274 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 750, loss[loss=0.122, beats_loss=0.008985, ecapa_loss=0.0002456, whisper_loss=0.1105, over 16436.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01123, ecapa_loss=0.0002059, whisper_loss=0.09482, over 3731251.75 frames. ], batch size: 66, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:53:35,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=877090.0, ans=0.0 2024-08-11 02:53:47,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=877190.0, ans=0.2 2024-08-11 02:54:02,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=877290.0, ans=0.125 2024-08-11 02:54:05,040 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-11 02:54:05,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=877290.0, ans=0.0 2024-08-11 02:54:07,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=877290.0, ans=0.125 2024-08-11 02:54:16,887 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 02:54:17,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877390.0, ans=0.1 2024-08-11 02:54:18,914 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.89 vs. limit=15.0 2024-08-11 02:54:27,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=877490.0, ans=0.05 2024-08-11 02:54:36,322 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 800, loss[loss=0.1208, beats_loss=0.01171, ecapa_loss=0.0001855, whisper_loss=0.1072, over 19316.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01128, ecapa_loss=0.0002051, whisper_loss=0.09378, over 3748531.44 frames. ], batch size: 75, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:54:53,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=877690.0, ans=0.125 2024-08-11 02:54:56,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=877690.0, ans=0.125 2024-08-11 02:54:59,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-11 02:54:59,679 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-11 02:55:05,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=877790.0, ans=0.95 2024-08-11 02:55:08,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=877790.0, ans=0.125 2024-08-11 02:55:14,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=877890.0, ans=0.1 2024-08-11 02:55:16,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=877890.0, ans=0.2 2024-08-11 02:55:26,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877890.0, ans=0.1 2024-08-11 02:55:27,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=877990.0, ans=0.0 2024-08-11 02:55:29,540 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.644e+01 2.972e+01 3.441e+01 7.984e+01, threshold=5.944e+01, percent-clipped=1.0 2024-08-11 02:55:31,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=877990.0, ans=0.0 2024-08-11 02:55:31,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=877990.0, ans=0.125 2024-08-11 02:55:33,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=877990.0, ans=0.0 2024-08-11 02:55:36,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=877990.0, ans=0.0 2024-08-11 02:55:41,242 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 850, loss[loss=0.1023, beats_loss=0.01255, ecapa_loss=0.0001818, whisper_loss=0.08797, over 22667.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01125, ecapa_loss=0.0002039, whisper_loss=0.09437, over 3790243.73 frames. ], batch size: 92, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:55:46,627 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 23 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-11 02:55:49,159 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-11 02:56:16,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=878290.0, ans=0.0 2024-08-11 02:56:17,230 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 16 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 02:56:21,228 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 02:56:25,303 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 02:56:32,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.16 vs. limit=22.5 2024-08-11 02:56:35,451 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 02:56:45,630 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 900, loss[loss=0.09949, beats_loss=0.01196, ecapa_loss=0.0001495, whisper_loss=0.08603, over 16970.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01117, ecapa_loss=0.0002027, whisper_loss=0.09417, over 3786090.07 frames. ], batch size: 64, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:57:26,748 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 02:57:33,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=878890.0, ans=0.035 2024-08-11 02:57:33,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=878890.0, ans=0.95 2024-08-11 02:57:37,778 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 11 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 02:57:38,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.614e+01 2.988e+01 3.449e+01 5.810e+01, threshold=5.976e+01, percent-clipped=0.0 2024-08-11 02:57:50,390 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 02:57:51,473 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 950, loss[loss=0.09684, beats_loss=0.01223, ecapa_loss=0.0001764, whisper_loss=0.08285, over 16046.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01119, ecapa_loss=0.0002044, whisper_loss=0.09367, over 3766379.17 frames. ], batch size: 62, lr: 9.31e-03, grad_scale: 281474976710656.0 2024-08-11 02:57:55,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=879090.0, ans=0.125 2024-08-11 02:58:23,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=879290.0, ans=0.125 2024-08-11 02:58:27,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=879290.0, ans=0.2 2024-08-11 02:58:29,445 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=12.0 2024-08-11 02:58:33,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=879390.0, ans=0.1 2024-08-11 02:58:40,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=12.0 2024-08-11 02:58:47,182 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 02:58:48,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=879490.0, ans=0.2 2024-08-11 02:58:53,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=879490.0, ans=0.125 2024-08-11 02:59:00,821 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1000, loss[loss=0.1019, beats_loss=0.01018, ecapa_loss=0.0001678, whisper_loss=0.09009, over 17213.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01128, ecapa_loss=0.0002028, whisper_loss=0.09325, over 3787974.61 frames. ], batch size: 64, lr: 9.31e-03, grad_scale: 281474976710656.0 2024-08-11 02:59:19,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=879690.0, ans=0.1 2024-08-11 02:59:23,610 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 02:59:43,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=879890.0, ans=0.05 2024-08-11 02:59:48,646 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 28 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 02:59:52,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=879890.0, ans=0.2 2024-08-11 02:59:52,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=879890.0, ans=0.2 2024-08-11 02:59:56,592 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2024-08-11 02:59:56,670 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.27 vs. limit=10.0 2024-08-11 03:00:01,724 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.796e+01 3.092e+01 3.418e+01 4.355e+01, threshold=6.184e+01, percent-clipped=0.0 2024-08-11 03:00:12,790 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-11 03:00:13,915 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1050, loss[loss=0.1128, beats_loss=0.0142, ecapa_loss=0.0001633, whisper_loss=0.09692, over 16155.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01141, ecapa_loss=0.0002023, whisper_loss=0.0924, over 3791216.43 frames. ], batch size: 61, lr: 9.31e-03, grad_scale: 562949953421312.0 2024-08-11 03:00:24,679 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2024-08-11 03:00:25,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=880090.0, ans=0.0 2024-08-11 03:00:27,615 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.90 vs. limit=6.0 2024-08-11 03:00:48,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=880290.0, ans=0.0 2024-08-11 03:00:48,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=880290.0, ans=0.0 2024-08-11 03:01:06,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=880390.0, ans=0.1 2024-08-11 03:01:10,566 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 03:01:10,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=880490.0, ans=0.07 2024-08-11 03:01:15,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=880490.0, ans=0.125 2024-08-11 03:01:21,821 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.009e+01 2024-08-11 03:01:23,586 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-11 03:01:27,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1100, loss[loss=0.1006, beats_loss=0.01299, ecapa_loss=0.0002253, whisper_loss=0.08532, over 19749.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01137, ecapa_loss=0.000202, whisper_loss=0.09248, over 3774191.70 frames. ], batch size: 80, lr: 9.31e-03, grad_scale: 562949953421312.0 2024-08-11 03:01:30,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=880590.0, ans=0.1 2024-08-11 03:01:30,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=880590.0, ans=0.0 2024-08-11 03:01:42,278 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 03:01:53,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=880690.0, ans=0.0 2024-08-11 03:01:53,318 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2024-08-11 03:01:59,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=880790.0, ans=0.1 2024-08-11 03:02:04,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=880790.0, ans=0.2 2024-08-11 03:02:11,949 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=22.5 2024-08-11 03:02:20,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=880890.0, ans=0.125 2024-08-11 03:02:27,958 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.648e+01 3.166e+01 3.461e+01 5.758e+01, threshold=6.333e+01, percent-clipped=0.0 2024-08-11 03:02:31,702 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 03:02:36,556 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 03:02:36,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=880990.0, ans=0.125 2024-08-11 03:02:40,908 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1150, loss[loss=0.1121, beats_loss=0.01348, ecapa_loss=0.0001482, whisper_loss=0.09717, over 23677.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01129, ecapa_loss=0.000201, whisper_loss=0.09432, over 3827325.82 frames. ], batch size: 89, lr: 9.30e-03, grad_scale: 562949953421312.0 2024-08-11 03:02:56,027 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2024-08-11 03:02:56,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=881190.0, ans=0.2 2024-08-11 03:03:04,104 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-08-11 03:03:05,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=881190.0, ans=0.0 2024-08-11 03:03:17,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=881290.0, ans=0.0 2024-08-11 03:03:25,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=881390.0, ans=0.2 2024-08-11 03:03:40,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=881490.0, ans=0.0 2024-08-11 03:03:43,478 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-11 03:03:49,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=881490.0, ans=0.1 2024-08-11 03:03:52,795 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1200, loss[loss=0.119, beats_loss=0.008313, ecapa_loss=0.0002649, whisper_loss=0.1081, over 22090.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01135, ecapa_loss=0.0002016, whisper_loss=0.09365, over 3797076.24 frames. ], batch size: 90, lr: 9.30e-03, grad_scale: 562949953421312.0 2024-08-11 03:04:18,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=881690.0, ans=0.0 2024-08-11 03:04:22,511 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.73 vs. limit=15.0 2024-08-11 03:04:24,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=881790.0, ans=0.125 2024-08-11 03:04:25,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=881790.0, ans=0.125 2024-08-11 03:04:45,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=881890.0, ans=0.0 2024-08-11 03:04:47,106 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 03:04:52,463 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.507e+01 2.887e+01 3.348e+01 4.586e+01, threshold=5.774e+01, percent-clipped=0.0 2024-08-11 03:05:05,664 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1250, loss[loss=0.1126, beats_loss=0.009737, ecapa_loss=0.0002161, whisper_loss=0.1007, over 21700.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01132, ecapa_loss=0.0002012, whisper_loss=0.09414, over 3824811.41 frames. ], batch size: 82, lr: 9.30e-03, grad_scale: 562949953421312.0 2024-08-11 03:05:15,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=882090.0, ans=0.0 2024-08-11 03:05:17,363 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=15.0 2024-08-11 03:05:28,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=882190.0, ans=0.09899494936611666 2024-08-11 03:05:55,031 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 19 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 03:05:56,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=882390.0, ans=0.0 2024-08-11 03:05:57,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=882390.0, ans=0.1 2024-08-11 03:06:12,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=882490.0, ans=0.1 2024-08-11 03:06:20,286 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1300, loss[loss=0.109, beats_loss=0.01295, ecapa_loss=0.0002054, whisper_loss=0.09405, over 22239.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01129, ecapa_loss=0.000202, whisper_loss=0.09429, over 3829353.06 frames. ], batch size: 91, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:06:33,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=882690.0, ans=0.125 2024-08-11 03:06:40,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=882690.0, ans=0.2 2024-08-11 03:06:42,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=882690.0, ans=0.125 2024-08-11 03:06:43,448 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 22 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-11 03:06:58,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=882790.0, ans=0.0 2024-08-11 03:07:18,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=882990.0, ans=0.0 2024-08-11 03:07:20,724 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.642e+01 3.016e+01 3.566e+01 8.330e+01, threshold=6.031e+01, percent-clipped=1.0 2024-08-11 03:07:25,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=882990.0, ans=0.125 2024-08-11 03:07:34,490 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1350, loss[loss=0.1078, beats_loss=0.01243, ecapa_loss=0.0001613, whisper_loss=0.0938, over 16255.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01137, ecapa_loss=0.0001999, whisper_loss=0.09425, over 3848510.46 frames. ], batch size: 63, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:07:43,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=883090.0, ans=0.125 2024-08-11 03:07:46,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=883090.0, ans=0.0 2024-08-11 03:07:54,886 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 03:07:59,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=883190.0, ans=0.035 2024-08-11 03:08:02,586 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 03:08:06,859 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 03:08:12,135 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 03:08:30,497 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 03:08:48,095 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1400, loss[loss=0.107, beats_loss=0.009632, ecapa_loss=0.0002078, whisper_loss=0.0953, over 18066.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01135, ecapa_loss=0.0002012, whisper_loss=0.09361, over 3818449.54 frames. ], batch size: 71, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:08:56,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=883590.0, ans=0.125 2024-08-11 03:09:07,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=883690.0, ans=0.05 2024-08-11 03:09:14,789 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 03:09:20,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=883790.0, ans=0.0 2024-08-11 03:09:22,353 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.696e+05 2024-08-11 03:09:34,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=883890.0, ans=0.95 2024-08-11 03:09:43,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=883890.0, ans=0.1 2024-08-11 03:09:49,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.657e+01 3.071e+01 3.496e+01 6.029e+01, threshold=6.143e+01, percent-clipped=0.0 2024-08-11 03:10:37,343 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1450, loss[loss=0.1018, beats_loss=0.01304, ecapa_loss=0.000223, whisper_loss=0.0865, over 18310.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01132, ecapa_loss=0.0002005, whisper_loss=0.09382, over 3812960.95 frames. ], batch size: 76, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:10:40,205 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 03:11:07,778 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 11 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 03:11:14,007 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 03:11:21,471 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-11 03:11:35,081 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 03:11:35,409 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.102e-01 2024-08-11 03:11:43,462 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-11 03:11:47,217 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-11 03:11:53,061 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1500, loss[loss=0.1074, beats_loss=0.01009, ecapa_loss=0.0002565, whisper_loss=0.09475, over 16911.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01146, ecapa_loss=0.000201, whisper_loss=0.09214, over 3795492.65 frames. ], batch size: 70, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:12:10,433 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 03:12:16,425 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-11 03:12:20,019 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=15.0 2024-08-11 03:12:20,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=884790.0, ans=0.2 2024-08-11 03:12:43,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=884890.0, ans=0.0 2024-08-11 03:12:43,804 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.26 vs. limit=22.5 2024-08-11 03:12:45,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=884890.0, ans=0.1 2024-08-11 03:12:48,521 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.28 vs. limit=6.0 2024-08-11 03:12:53,740 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.731e+01 3.107e+01 3.593e+01 6.683e+01, threshold=6.214e+01, percent-clipped=1.0 2024-08-11 03:13:05,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=884990.0, ans=0.0 2024-08-11 03:13:07,800 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1550, loss[loss=0.1101, beats_loss=0.01143, ecapa_loss=0.0001847, whisper_loss=0.0968, over 20766.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01142, ecapa_loss=0.0002012, whisper_loss=0.0919, over 3800481.56 frames. ], batch size: 81, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:13:10,740 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 31 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-11 03:13:13,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=885090.0, ans=10.0 2024-08-11 03:13:16,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=885090.0, ans=0.0 2024-08-11 03:13:50,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=885390.0, ans=0.2 2024-08-11 03:13:57,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=885390.0, ans=0.125 2024-08-11 03:14:04,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=885390.0, ans=0.0 2024-08-11 03:14:21,450 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1600, loss[loss=0.09531, beats_loss=0.01449, ecapa_loss=0.0001674, whisper_loss=0.07915, over 18489.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01139, ecapa_loss=0.0002002, whisper_loss=0.09213, over 3803411.93 frames. ], batch size: 78, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:14:25,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=885590.0, ans=0.04949747468305833 2024-08-11 03:14:28,628 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 03:14:30,118 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 03:14:33,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=885590.0, ans=0.0 2024-08-11 03:15:02,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=885790.0, ans=0.125 2024-08-11 03:15:05,256 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 03:15:15,782 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.44 vs. limit=15.0 2024-08-11 03:15:21,782 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.608e+01 2.973e+01 3.361e+01 6.559e+01, threshold=5.946e+01, percent-clipped=1.0 2024-08-11 03:15:26,101 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 03:15:27,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=885990.0, ans=0.07 2024-08-11 03:15:29,043 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 03:15:32,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885990.0, ans=0.1 2024-08-11 03:15:32,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=885990.0, ans=0.1 2024-08-11 03:15:34,141 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1650, loss[loss=0.09991, beats_loss=0.0123, ecapa_loss=0.0001844, whisper_loss=0.08577, over 17654.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01139, ecapa_loss=0.0001993, whisper_loss=0.09267, over 3825371.16 frames. ], batch size: 69, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:15:45,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=886090.0, ans=0.125 2024-08-11 03:15:46,421 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-11 03:15:50,900 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 03:16:05,455 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.898e+02 2024-08-11 03:16:17,075 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.98 vs. limit=10.0 2024-08-11 03:16:44,950 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1700, loss[loss=0.09982, beats_loss=0.0126, ecapa_loss=0.0001579, whisper_loss=0.08564, over 18720.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01134, ecapa_loss=0.0001992, whisper_loss=0.0935, over 3831800.05 frames. ], batch size: 73, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:16:51,302 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.99 vs. limit=6.0 2024-08-11 03:17:29,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=886890.0, ans=0.125 2024-08-11 03:17:30,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=886890.0, ans=0.125 2024-08-11 03:17:42,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.697e+01 3.081e+01 3.373e+01 4.997e+01, threshold=6.161e+01, percent-clipped=0.0 2024-08-11 03:17:52,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=886990.0, ans=0.125 2024-08-11 03:17:55,102 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1750, loss[loss=0.1018, beats_loss=0.01034, ecapa_loss=0.0001949, whisper_loss=0.08952, over 14554.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01138, ecapa_loss=0.0001993, whisper_loss=0.09229, over 3799972.15 frames. ], batch size: 55, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:18:14,978 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 03:18:22,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=887290.0, ans=0.2 2024-08-11 03:18:26,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-08-11 03:18:30,428 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 03:18:33,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=887290.0, ans=0.0 2024-08-11 03:18:43,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=887390.0, ans=0.2 2024-08-11 03:18:43,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=887390.0, ans=0.1 2024-08-11 03:18:44,498 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 03:18:49,778 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 03:18:53,876 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 03:18:57,888 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 03:19:00,304 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-11 03:19:03,015 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1800, loss[loss=0.0957, beats_loss=0.01229, ecapa_loss=0.0001923, whisper_loss=0.08148, over 18144.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01136, ecapa_loss=0.0002007, whisper_loss=0.09261, over 3843160.35 frames. ], batch size: 73, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:19:06,168 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.773e-01 2024-08-11 03:19:10,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=887590.0, ans=0.2 2024-08-11 03:19:31,672 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.36 vs. limit=10.0 2024-08-11 03:19:57,679 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 03:20:00,243 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.601e+01 2.973e+01 3.471e+01 4.949e+01, threshold=5.947e+01, percent-clipped=0.0 2024-08-11 03:20:00,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=887990.0, ans=0.1 2024-08-11 03:20:05,679 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 03:20:13,334 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1850, loss[loss=0.1032, beats_loss=0.01231, ecapa_loss=0.0002119, whisper_loss=0.08873, over 22445.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01139, ecapa_loss=0.0002016, whisper_loss=0.09254, over 3838835.46 frames. ], batch size: 91, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:20:22,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=888090.0, ans=0.125 2024-08-11 03:20:26,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=888190.0, ans=0.125 2024-08-11 03:20:44,286 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-11 03:20:44,858 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.17 vs. limit=22.5 2024-08-11 03:20:52,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=888290.0, ans=0.0 2024-08-11 03:21:20,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=888490.0, ans=0.125 2024-08-11 03:21:22,167 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1900, loss[loss=0.07779, beats_loss=0.01205, ecapa_loss=0.0002023, whisper_loss=0.06373, over 14709.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01133, ecapa_loss=0.0002045, whisper_loss=0.09279, over 3836315.25 frames. ], batch size: 59, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:21:37,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=888690.0, ans=0.0 2024-08-11 03:21:38,357 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 03:21:42,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=888690.0, ans=0.0 2024-08-11 03:22:04,835 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 03:22:10,196 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 03:22:16,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.166e+01 2.590e+01 3.002e+01 3.327e+01 6.064e+01, threshold=6.004e+01, percent-clipped=1.0 2024-08-11 03:22:17,255 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 30 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-11 03:22:30,303 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 1950, loss[loss=0.1065, beats_loss=0.00903, ecapa_loss=0.0002749, whisper_loss=0.0947, over 17379.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01133, ecapa_loss=0.0002063, whisper_loss=0.09283, over 3814320.42 frames. ], batch size: 73, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:22:39,863 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 03:22:47,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=889190.0, ans=0.125 2024-08-11 03:22:52,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=889190.0, ans=0.0 2024-08-11 03:23:12,274 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-11 03:23:36,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=889490.0, ans=0.0 2024-08-11 03:23:38,599 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2000, loss[loss=0.1167, beats_loss=0.009842, ecapa_loss=0.0002556, whisper_loss=0.1043, over 21121.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01133, ecapa_loss=0.0002064, whisper_loss=0.09245, over 3786455.15 frames. ], batch size: 84, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:23:47,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=889590.0, ans=0.0 2024-08-11 03:23:47,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=889590.0, ans=0.0 2024-08-11 03:23:56,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=889690.0, ans=0.125 2024-08-11 03:24:04,936 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 03:24:34,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.753e+01 3.127e+01 3.595e+01 5.672e+01, threshold=6.254e+01, percent-clipped=0.0 2024-08-11 03:24:47,627 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2050, loss[loss=0.1016, beats_loss=0.01186, ecapa_loss=0.0001887, whisper_loss=0.08788, over 20245.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01146, ecapa_loss=0.0002056, whisper_loss=0.09221, over 3834670.11 frames. ], batch size: 81, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:25:16,242 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-11 03:25:20,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=890290.0, ans=0.2 2024-08-11 03:25:26,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=890290.0, ans=0.0 2024-08-11 03:25:37,799 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 03:25:45,032 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 03:25:59,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=890590.0, ans=0.125 2024-08-11 03:26:00,596 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2100, loss[loss=0.1136, beats_loss=0.008631, ecapa_loss=0.0002321, whisper_loss=0.1027, over 22302.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01152, ecapa_loss=0.0002053, whisper_loss=0.09117, over 3813513.36 frames. ], batch size: 91, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:26:02,697 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-11 03:26:09,253 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 03:26:12,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=890590.0, ans=0.5 2024-08-11 03:26:18,865 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 03:26:20,345 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 03:26:22,107 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 03:26:25,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=890690.0, ans=0.0 2024-08-11 03:26:30,858 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 03:26:32,453 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 03:26:32,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=890790.0, ans=0.125 2024-08-11 03:27:01,095 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.655e+01 3.007e+01 3.449e+01 4.820e+01, threshold=6.014e+01, percent-clipped=0.0 2024-08-11 03:27:01,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=890990.0, ans=0.0 2024-08-11 03:27:14,300 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2150, loss[loss=0.1033, beats_loss=0.01228, ecapa_loss=0.0001647, whisper_loss=0.08937, over 21817.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01148, ecapa_loss=0.000205, whisper_loss=0.09215, over 3818895.31 frames. ], batch size: 83, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:27:18,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=891090.0, ans=10.0 2024-08-11 03:27:22,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=891090.0, ans=0.0 2024-08-11 03:27:27,806 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 03:27:57,815 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-11 03:28:02,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=891390.0, ans=0.0 2024-08-11 03:28:26,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2200, loss[loss=0.108, beats_loss=0.01102, ecapa_loss=0.0001986, whisper_loss=0.09497, over 16158.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01145, ecapa_loss=0.0002059, whisper_loss=0.09283, over 3804324.76 frames. ], batch size: 61, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:28:27,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=891590.0, ans=0.125 2024-08-11 03:29:09,555 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 22 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-11 03:29:11,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=891890.0, ans=0.05 2024-08-11 03:29:27,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=891990.0, ans=0.0 2024-08-11 03:29:27,787 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.671e+01 3.021e+01 3.496e+01 5.518e+01, threshold=6.042e+01, percent-clipped=0.0 2024-08-11 03:29:33,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=891990.0, ans=0.04949747468305833 2024-08-11 03:29:39,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=892090.0, ans=0.0 2024-08-11 03:29:40,595 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2250, loss[loss=0.08481, beats_loss=0.01414, ecapa_loss=0.0002156, whisper_loss=0.06851, over 17811.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0114, ecapa_loss=0.0002079, whisper_loss=0.09411, over 3805690.04 frames. ], batch size: 75, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:30:03,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=892190.0, ans=0.1 2024-08-11 03:30:16,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=892290.0, ans=0.125 2024-08-11 03:30:29,025 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.707e-01 2024-08-11 03:30:34,796 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2024-08-11 03:30:47,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=892490.0, ans=0.0 2024-08-11 03:30:52,501 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2300, loss[loss=0.1059, beats_loss=0.01126, ecapa_loss=0.0001878, whisper_loss=0.09272, over 18099.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01144, ecapa_loss=0.0002085, whisper_loss=0.09392, over 3841544.17 frames. ], batch size: 69, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:30:58,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=892590.0, ans=0.2 2024-08-11 03:31:04,469 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 13 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 03:31:14,168 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 03:31:16,040 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2024-08-11 03:31:20,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=892690.0, ans=0.07 2024-08-11 03:31:28,467 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=15.0 2024-08-11 03:31:32,026 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 03:31:33,584 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 03:31:38,527 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 03:31:54,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.862e+01 3.270e+01 3.564e+01 5.997e+01, threshold=6.539e+01, percent-clipped=0.0 2024-08-11 03:31:58,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=892990.0, ans=0.0 2024-08-11 03:32:09,156 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2350, loss[loss=0.08493, beats_loss=0.01242, ecapa_loss=0.0002399, whisper_loss=0.07011, over 21506.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01136, ecapa_loss=0.0002095, whisper_loss=0.09422, over 3821660.87 frames. ], batch size: 91, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:32:13,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=893090.0, ans=0.125 2024-08-11 03:32:16,115 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 03:32:27,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=893190.0, ans=0.1 2024-08-11 03:32:33,008 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 03:32:34,459 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 03:32:42,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.47 vs. limit=15.0 2024-08-11 03:32:53,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=893290.0, ans=0.125 2024-08-11 03:33:24,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=893590.0, ans=0.125 2024-08-11 03:33:25,640 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2400, loss[loss=0.1024, beats_loss=0.01552, ecapa_loss=0.0001379, whisper_loss=0.08552, over 15427.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01135, ecapa_loss=0.0002079, whisper_loss=0.09515, over 3808038.87 frames. ], batch size: 60, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:33:43,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=893690.0, ans=0.1 2024-08-11 03:34:28,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.589e+01 2.936e+01 3.311e+01 5.160e+01, threshold=5.871e+01, percent-clipped=0.0 2024-08-11 03:34:34,245 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2024-08-11 03:34:36,181 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 03:34:42,127 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 03:34:44,112 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2450, loss[loss=0.1153, beats_loss=0.00923, ecapa_loss=0.0002301, whisper_loss=0.1037, over 16875.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.0113, ecapa_loss=0.0002089, whisper_loss=0.09537, over 3840564.96 frames. ], batch size: 67, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:34:54,783 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 03:34:55,599 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.40 vs. limit=10.0 2024-08-11 03:34:56,087 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-11 03:34:57,382 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 03:35:03,256 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-11 03:35:03,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=894190.0, ans=0.2 2024-08-11 03:35:20,234 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 16 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-11 03:35:42,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=894490.0, ans=0.125 2024-08-11 03:35:45,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894490.0, ans=0.1 2024-08-11 03:35:46,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=894490.0, ans=0.07 2024-08-11 03:35:58,250 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2500, loss[loss=0.1217, beats_loss=0.01194, ecapa_loss=0.0001996, whisper_loss=0.1078, over 22638.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01127, ecapa_loss=0.0002098, whisper_loss=0.09532, over 3821954.85 frames. ], batch size: 90, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:36:09,119 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 03:36:15,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=894690.0, ans=0.125 2024-08-11 03:36:33,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=894790.0, ans=0.125 2024-08-11 03:37:03,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.710e+01 3.039e+01 3.423e+01 5.787e+01, threshold=6.079e+01, percent-clipped=0.0 2024-08-11 03:37:11,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=894990.0, ans=0.125 2024-08-11 03:37:11,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=894990.0, ans=10.0 2024-08-11 03:37:16,841 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2550, loss[loss=0.08077, beats_loss=0.0147, ecapa_loss=0.0002171, whisper_loss=0.0639, over 17789.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01129, ecapa_loss=0.0002102, whisper_loss=0.09522, over 3857991.49 frames. ], batch size: 73, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:37:32,160 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=12.0 2024-08-11 03:37:32,873 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 36 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 03:37:48,652 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-11 03:38:06,433 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 03:38:13,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=895390.0, ans=0.125 2024-08-11 03:38:22,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=895490.0, ans=0.125 2024-08-11 03:38:32,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=895590.0, ans=0.125 2024-08-11 03:38:33,467 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2600, loss[loss=0.09114, beats_loss=0.01403, ecapa_loss=0.000207, whisper_loss=0.07504, over 16123.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01137, ecapa_loss=0.000209, whisper_loss=0.09498, over 3854488.99 frames. ], batch size: 66, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:38:41,819 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=12.0 2024-08-11 03:38:59,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=895690.0, ans=0.0 2024-08-11 03:39:02,373 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.295e-02 2024-08-11 03:39:30,619 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 03:39:31,929 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 03:39:33,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=895990.0, ans=0.0 2024-08-11 03:39:36,697 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.572e+01 2.896e+01 3.197e+01 4.923e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-11 03:39:50,276 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2650, loss[loss=0.08975, beats_loss=0.01332, ecapa_loss=0.0001601, whisper_loss=0.07484, over 15935.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01137, ecapa_loss=0.0002095, whisper_loss=0.09446, over 3845151.21 frames. ], batch size: 63, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:40:01,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=896090.0, ans=0.125 2024-08-11 03:40:15,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=896190.0, ans=0.1 2024-08-11 03:40:39,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=896390.0, ans=0.2 2024-08-11 03:40:42,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=896390.0, ans=0.125 2024-08-11 03:40:49,644 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 03:40:58,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=896490.0, ans=0.0 2024-08-11 03:41:02,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=896490.0, ans=0.05 2024-08-11 03:41:05,859 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2700, loss[loss=0.1184, beats_loss=0.01113, ecapa_loss=0.0001955, whisper_loss=0.1053, over 17539.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01139, ecapa_loss=0.0002088, whisper_loss=0.09431, over 3859176.50 frames. ], batch size: 70, lr: 9.22e-03, grad_scale: 562949953421312.0 2024-08-11 03:41:44,290 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 03:42:10,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=896990.0, ans=0.125 2024-08-11 03:42:11,723 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.641e+01 2.955e+01 3.583e+01 6.037e+01, threshold=5.910e+01, percent-clipped=1.0 2024-08-11 03:42:21,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=896990.0, ans=0.2 2024-08-11 03:42:25,486 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2750, loss[loss=0.116, beats_loss=0.0109, ecapa_loss=0.0002624, whisper_loss=0.1025, over 22059.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01145, ecapa_loss=0.0002078, whisper_loss=0.0936, over 3863509.15 frames. ], batch size: 92, lr: 9.22e-03, grad_scale: 562949953421312.0 2024-08-11 03:42:35,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=897090.0, ans=0.125 2024-08-11 03:43:11,956 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 27 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 03:43:13,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=897390.0, ans=0.0 2024-08-11 03:43:23,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=897390.0, ans=0.09899494936611666 2024-08-11 03:43:43,793 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2800, loss[loss=0.09928, beats_loss=0.01224, ecapa_loss=0.0002154, whisper_loss=0.08488, over 18993.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01142, ecapa_loss=0.0002077, whisper_loss=0.09361, over 3852726.97 frames. ], batch size: 78, lr: 9.22e-03, grad_scale: 562949953421312.0 2024-08-11 03:43:57,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=897590.0, ans=0.125 2024-08-11 03:44:10,481 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 03:44:32,188 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 45 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 03:44:48,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.147e+01 2.710e+01 2.962e+01 3.650e+01 5.339e+01, threshold=5.923e+01, percent-clipped=0.0 2024-08-11 03:44:54,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=897990.0, ans=0.1 2024-08-11 03:45:02,306 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2850, loss[loss=0.09859, beats_loss=0.01126, ecapa_loss=0.0001791, whisper_loss=0.08554, over 14756.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01147, ecapa_loss=0.0002084, whisper_loss=0.09367, over 3853578.26 frames. ], batch size: 54, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:45:08,546 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 03:45:12,016 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.92 vs. limit=6.0 2024-08-11 03:45:13,021 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 03:45:22,978 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 03:45:27,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898190.0, ans=0.1 2024-08-11 03:45:29,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=898190.0, ans=0.125 2024-08-11 03:45:48,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=898290.0, ans=0.0 2024-08-11 03:46:08,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=898490.0, ans=0.0 2024-08-11 03:46:10,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=898490.0, ans=0.2 2024-08-11 03:46:24,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=898590.0, ans=0.1 2024-08-11 03:46:25,194 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2900, loss[loss=0.09527, beats_loss=0.01174, ecapa_loss=0.0002005, whisper_loss=0.08153, over 18024.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01148, ecapa_loss=0.0002091, whisper_loss=0.09371, over 3854899.62 frames. ], batch size: 73, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:46:25,397 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 03:46:45,574 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 03:46:50,360 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 03:47:14,739 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 03:47:18,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=898890.0, ans=0.125 2024-08-11 03:47:21,092 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 03:47:30,829 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.297e+01 2.659e+01 2.989e+01 3.721e+01 7.203e+01, threshold=5.978e+01, percent-clipped=1.0 2024-08-11 03:47:31,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=898990.0, ans=0.1 2024-08-11 03:47:44,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=899090.0, ans=0.125 2024-08-11 03:47:45,984 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 2950, loss[loss=0.1194, beats_loss=0.01051, ecapa_loss=0.0001981, whisper_loss=0.1069, over 22511.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01149, ecapa_loss=0.0002104, whisper_loss=0.09364, over 3852850.25 frames. ], batch size: 89, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:47:49,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=899090.0, ans=0.035 2024-08-11 03:48:11,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=899190.0, ans=0.0 2024-08-11 03:48:14,762 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 03:49:08,461 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3000, loss[loss=0.1174, beats_loss=0.01106, ecapa_loss=0.0001999, whisper_loss=0.1044, over 20065.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0114, ecapa_loss=0.0002096, whisper_loss=0.09421, over 3848964.50 frames. ], batch size: 78, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:49:08,462 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 03:49:33,030 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.2874, 1.7510, 2.1906, 1.5899, 2.2877, 2.2299, 2.3399, 2.0903], device='cuda:1') 2024-08-11 03:49:48,996 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on ASR_libri: loss=0.2586, beats_loss=0, ecapa_loss=0.0006718, whisper_loss=0.2519, over 922467.00 frames. 2024-08-11 03:50:07,432 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on SV_voxceleb1: loss=0.005617, beats_loss=0, ecapa_loss=0.0005617, whisper_loss=0, over 939242.00 frames. 2024-08-11 03:52:03,493 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on AT_audioset: loss=0.02572, beats_loss=0.02572, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 03:52:03,496 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 03:52:18,320 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 03:52:24,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=899690.0, ans=0.2 2024-08-11 03:52:36,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=899790.0, ans=0.125 2024-08-11 03:53:06,620 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 25 from LS+wenet, 9 from Vox, 20 fro AS 2024-08-11 03:53:08,664 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.17 vs. limit=15.0 2024-08-11 03:53:15,264 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.673e+01 3.038e+01 3.538e+01 6.757e+01, threshold=6.077e+01, percent-clipped=1.0 2024-08-11 03:53:26,941 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 03:53:30,247 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3050, loss[loss=0.1333, beats_loss=0.01098, ecapa_loss=0.0001936, whisper_loss=0.1203, over 20584.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01138, ecapa_loss=0.0002118, whisper_loss=0.09489, over 3890801.48 frames. ], batch size: 79, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:53:32,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=900090.0, ans=0.125 2024-08-11 03:53:41,134 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 03:53:53,235 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=12.0 2024-08-11 03:53:58,795 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.48 vs. limit=22.5 2024-08-11 03:54:03,380 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2024-08-11 03:54:09,401 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 03:54:10,010 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2024-08-11 03:54:11,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=900290.0, ans=0.1 2024-08-11 03:54:22,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=900390.0, ans=10.0 2024-08-11 03:54:32,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=900390.0, ans=0.125 2024-08-11 03:54:37,746 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 03:54:37,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=900390.0, ans=0.125 2024-08-11 03:54:46,966 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 03:54:57,790 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3100, loss[loss=0.1006, beats_loss=0.0101, ecapa_loss=0.000246, whisper_loss=0.08802, over 22265.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01137, ecapa_loss=0.0002113, whisper_loss=0.09512, over 3911388.88 frames. ], batch size: 91, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:55:38,442 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 03:55:45,743 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 03:55:56,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=900890.0, ans=0.125 2024-08-11 03:55:56,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=900890.0, ans=0.07 2024-08-11 03:55:59,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=900890.0, ans=0.125 2024-08-11 03:56:05,004 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.642e+01 2.994e+01 3.477e+01 5.395e+01, threshold=5.988e+01, percent-clipped=0.0 2024-08-11 03:56:17,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=900990.0, ans=0.0 2024-08-11 03:56:20,001 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3150, loss[loss=0.1201, beats_loss=0.01206, ecapa_loss=0.0002233, whisper_loss=0.1058, over 18531.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01143, ecapa_loss=0.0002109, whisper_loss=0.09544, over 3907362.75 frames. ], batch size: 75, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:57:08,182 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 03:57:22,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=901390.0, ans=0.2 2024-08-11 03:57:24,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=901490.0, ans=0.0 2024-08-11 03:57:44,137 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3200, loss[loss=0.09792, beats_loss=0.01315, ecapa_loss=0.000208, whisper_loss=0.08269, over 23337.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01147, ecapa_loss=0.0002105, whisper_loss=0.09577, over 3937494.52 frames. ], batch size: 94, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:57:49,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=901590.0, ans=0.125 2024-08-11 03:57:52,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=901590.0, ans=0.125 2024-08-11 03:57:57,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=901590.0, ans=0.0 2024-08-11 03:58:06,834 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 03:58:15,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=901790.0, ans=0.125 2024-08-11 03:58:15,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=901790.0, ans=0.0 2024-08-11 03:58:34,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=901890.0, ans=0.1 2024-08-11 03:58:36,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=901890.0, ans=0.125 2024-08-11 03:58:38,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=901890.0, ans=0.125 2024-08-11 03:58:51,763 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.680e+01 2.966e+01 3.598e+01 6.746e+01, threshold=5.932e+01, percent-clipped=1.0 2024-08-11 03:59:06,866 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3250, loss[loss=0.1278, beats_loss=0.01131, ecapa_loss=0.0002364, whisper_loss=0.1141, over 21734.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01138, ecapa_loss=0.0002102, whisper_loss=0.09598, over 3933490.95 frames. ], batch size: 88, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 03:59:23,606 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-11 03:59:28,461 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 03:59:46,109 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-11 03:59:48,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=902290.0, ans=0.0 2024-08-11 04:00:04,225 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 04:00:20,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=902490.0, ans=0.1 2024-08-11 04:00:21,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=902490.0, ans=0.2 2024-08-11 04:00:23,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=902490.0, ans=0.2 2024-08-11 04:00:25,924 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3300, loss[loss=0.08175, beats_loss=0.01192, ecapa_loss=0.0001919, whisper_loss=0.06791, over 19676.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01149, ecapa_loss=0.0002091, whisper_loss=0.09489, over 3933190.34 frames. ], batch size: 80, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 04:00:56,638 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 04:01:05,561 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.09 vs. limit=10.0 2024-08-11 04:01:06,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=902790.0, ans=0.035 2024-08-11 04:01:25,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=902890.0, ans=0.0 2024-08-11 04:01:27,561 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 04:01:38,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.725e+01 3.245e+01 3.907e+01 7.359e+01, threshold=6.490e+01, percent-clipped=2.0 2024-08-11 04:01:52,066 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3350, loss[loss=0.09975, beats_loss=0.01128, ecapa_loss=0.0001667, whisper_loss=0.08681, over 24198.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.0114, ecapa_loss=0.0002103, whisper_loss=0.09444, over 3923887.91 frames. ], batch size: 93, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 04:01:52,534 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 20 from LS+wenet, 19 from Vox, 54 fro AS 2024-08-11 04:02:12,608 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 14 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 04:02:16,919 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 04:02:42,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=903390.0, ans=0.125 2024-08-11 04:02:50,086 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.304e-01 2024-08-11 04:03:00,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=903490.0, ans=0.125 2024-08-11 04:03:09,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=903490.0, ans=0.125 2024-08-11 04:03:12,183 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 04:03:13,318 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3400, loss[loss=0.09655, beats_loss=0.01244, ecapa_loss=0.0001839, whisper_loss=0.08227, over 15104.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01158, ecapa_loss=0.000209, whisper_loss=0.09346, over 3931481.59 frames. ], batch size: 60, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 04:03:14,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-08-11 04:03:35,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=903690.0, ans=0.0 2024-08-11 04:03:35,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=903690.0, ans=0.125 2024-08-11 04:04:01,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=903890.0, ans=0.09899494936611666 2024-08-11 04:04:03,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=903890.0, ans=0.07 2024-08-11 04:04:18,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.715e+01 3.132e+01 3.599e+01 6.001e+01, threshold=6.265e+01, percent-clipped=0.0 2024-08-11 04:04:29,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=903990.0, ans=0.0 2024-08-11 04:04:32,129 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3450, loss[loss=0.1128, beats_loss=0.01124, ecapa_loss=0.0002115, whisper_loss=0.09947, over 17694.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01157, ecapa_loss=0.0002098, whisper_loss=0.09305, over 3927100.11 frames. ], batch size: 68, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:04:45,756 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 04:04:59,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=904190.0, ans=0.02 2024-08-11 04:05:07,061 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-11 04:05:25,431 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 04:05:29,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=904490.0, ans=0.5 2024-08-11 04:05:35,918 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 04:05:42,353 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3500, loss[loss=0.09268, beats_loss=0.009549, ecapa_loss=0.0001945, whisper_loss=0.08119, over 14299.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0116, ecapa_loss=0.0002092, whisper_loss=0.09266, over 3910171.08 frames. ], batch size: 54, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:05:43,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=904590.0, ans=0.125 2024-08-11 04:05:47,797 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 16 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 04:05:50,736 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2024-08-11 04:05:55,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=904690.0, ans=0.0 2024-08-11 04:06:07,205 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 04:06:07,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=904790.0, ans=0.1 2024-08-11 04:06:23,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=904890.0, ans=0.2 2024-08-11 04:06:26,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=904890.0, ans=0.0 2024-08-11 04:06:35,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+01 2.786e+01 3.047e+01 3.456e+01 6.070e+01, threshold=6.093e+01, percent-clipped=0.0 2024-08-11 04:06:42,968 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.25 vs. limit=22.5 2024-08-11 04:06:44,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=904990.0, ans=0.1 2024-08-11 04:06:47,027 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3550, loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0002339, whisper_loss=0.09005, over 22927.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01161, ecapa_loss=0.0002085, whisper_loss=0.09197, over 3919031.09 frames. ], batch size: 94, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:06:49,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=905090.0, ans=0.2 2024-08-11 04:06:50,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=905090.0, ans=0.125 2024-08-11 04:07:08,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=905190.0, ans=0.125 2024-08-11 04:07:32,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=905390.0, ans=0.015 2024-08-11 04:07:41,367 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 04:07:46,693 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 04:07:51,984 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 04:07:53,403 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3600, loss[loss=0.1026, beats_loss=0.01139, ecapa_loss=0.0001936, whisper_loss=0.08927, over 17202.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01152, ecapa_loss=0.0002084, whisper_loss=0.09256, over 3891443.42 frames. ], batch size: 67, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:07:57,328 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 04:08:01,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=905590.0, ans=0.2 2024-08-11 04:08:03,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=905590.0, ans=0.0 2024-08-11 04:08:13,411 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 04:08:20,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=905790.0, ans=0.125 2024-08-11 04:08:24,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=905790.0, ans=0.125 2024-08-11 04:08:25,140 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 04:08:38,606 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=12.0 2024-08-11 04:08:47,159 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.715e+01 3.031e+01 3.500e+01 1.161e+02, threshold=6.062e+01, percent-clipped=1.0 2024-08-11 04:08:59,294 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3650, loss[loss=0.1082, beats_loss=0.01241, ecapa_loss=0.0001875, whisper_loss=0.09394, over 17670.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01156, ecapa_loss=0.0002084, whisper_loss=0.09247, over 3896946.58 frames. ], batch size: 66, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:09:24,183 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 04:09:31,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=906290.0, ans=0.1 2024-08-11 04:10:03,687 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-11 04:10:04,203 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3700, loss[loss=0.1153, beats_loss=0.01021, ecapa_loss=0.0002348, whisper_loss=0.1028, over 16339.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01153, ecapa_loss=0.0002097, whisper_loss=0.09262, over 3851285.62 frames. ], batch size: 65, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:10:04,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=906590.0, ans=0.125 2024-08-11 04:10:12,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=906590.0, ans=0.2 2024-08-11 04:10:12,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=906590.0, ans=0.0 2024-08-11 04:10:29,173 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.35 vs. limit=22.5 2024-08-11 04:10:35,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=906790.0, ans=0.125 2024-08-11 04:10:39,270 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 04:10:47,149 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 04:10:52,143 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 04:10:58,332 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.695e+01 3.038e+01 3.419e+01 5.061e+01, threshold=6.077e+01, percent-clipped=0.0 2024-08-11 04:11:10,824 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3750, loss[loss=0.09254, beats_loss=0.01185, ecapa_loss=0.00025, whisper_loss=0.07819, over 17658.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.0115, ecapa_loss=0.0002106, whisper_loss=0.09317, over 3851310.08 frames. ], batch size: 74, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:11:11,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=907090.0, ans=0.0 2024-08-11 04:11:21,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=907090.0, ans=0.0 2024-08-11 04:11:26,390 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 04:11:29,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=907190.0, ans=0.2 2024-08-11 04:11:29,388 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2024-08-11 04:11:43,504 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 04:11:47,241 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 04:11:59,014 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 04:12:08,170 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 04:12:16,160 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3800, loss[loss=0.1085, beats_loss=0.01137, ecapa_loss=0.0002154, whisper_loss=0.09499, over 19423.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01149, ecapa_loss=0.0002113, whisper_loss=0.09337, over 3860357.71 frames. ], batch size: 78, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:12:41,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=907790.0, ans=0.04949747468305833 2024-08-11 04:12:48,394 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 04:12:59,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=907890.0, ans=0.0 2024-08-11 04:13:09,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.721e+01 2.981e+01 3.416e+01 8.567e+01, threshold=5.961e+01, percent-clipped=1.0 2024-08-11 04:13:10,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=907990.0, ans=0.125 2024-08-11 04:13:20,347 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.09 vs. limit=22.5 2024-08-11 04:13:22,230 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3850, loss[loss=0.1119, beats_loss=0.01107, ecapa_loss=0.000244, whisper_loss=0.09838, over 17857.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01154, ecapa_loss=0.0002112, whisper_loss=0.09341, over 3862624.73 frames. ], batch size: 75, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:13:35,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.77 vs. limit=22.5 2024-08-11 04:13:36,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=908190.0, ans=0.2 2024-08-11 04:13:36,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=908190.0, ans=0.2 2024-08-11 04:13:50,365 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 04:13:59,690 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 04:14:02,901 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-11 04:14:32,844 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3900, loss[loss=0.1171, beats_loss=0.01116, ecapa_loss=0.0002147, whisper_loss=0.1038, over 22449.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01153, ecapa_loss=0.0002124, whisper_loss=0.09409, over 3882163.55 frames. ], batch size: 91, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:14:44,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=908590.0, ans=0.0 2024-08-11 04:14:45,500 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 04:15:12,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=908790.0, ans=0.2 2024-08-11 04:15:19,287 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:15:21,695 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-08-11 04:15:29,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=908890.0, ans=0.125 2024-08-11 04:15:32,878 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.684e+01 3.033e+01 3.679e+01 6.201e+01, threshold=6.065e+01, percent-clipped=1.0 2024-08-11 04:15:40,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=908990.0, ans=0.2 2024-08-11 04:15:45,761 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 3950, loss[loss=0.1044, beats_loss=0.01254, ecapa_loss=0.0002272, whisper_loss=0.08961, over 21688.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01139, ecapa_loss=0.000212, whisper_loss=0.09487, over 3873973.82 frames. ], batch size: 90, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:15:53,942 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-11 04:15:58,428 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 27 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 04:16:08,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=909190.0, ans=0.125 2024-08-11 04:16:23,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=909290.0, ans=0.05 2024-08-11 04:16:38,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=909390.0, ans=0.0 2024-08-11 04:16:51,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=909490.0, ans=0.125 2024-08-11 04:16:59,394 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4000, loss[loss=0.09023, beats_loss=0.01238, ecapa_loss=0.0002301, whisper_loss=0.07554, over 17016.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01138, ecapa_loss=0.0002117, whisper_loss=0.09508, over 3867619.43 frames. ], batch size: 72, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:17:02,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=909590.0, ans=0.1 2024-08-11 04:17:14,428 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 04:17:35,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=909790.0, ans=0.025 2024-08-11 04:17:41,744 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 04:17:46,977 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-11 04:18:00,773 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.789e+01 3.181e+01 3.971e+01 6.202e+01, threshold=6.363e+01, percent-clipped=1.0 2024-08-11 04:18:00,969 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 37 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 04:18:05,962 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-11 04:18:15,027 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4050, loss[loss=0.1317, beats_loss=0.007865, ecapa_loss=0.0002712, whisper_loss=0.1211, over 18799.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01138, ecapa_loss=0.000212, whisper_loss=0.09431, over 3851985.90 frames. ], batch size: 77, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:18:15,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=910090.0, ans=0.0 2024-08-11 04:18:46,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=910290.0, ans=0.0 2024-08-11 04:18:53,567 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2024-08-11 04:18:55,190 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-11 04:18:57,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=910290.0, ans=0.1 2024-08-11 04:19:00,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=910390.0, ans=0.125 2024-08-11 04:19:07,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=910390.0, ans=0.0 2024-08-11 04:19:27,438 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=15.0 2024-08-11 04:19:30,233 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4100, loss[loss=0.1014, beats_loss=0.01263, ecapa_loss=0.0001672, whisper_loss=0.0871, over 21266.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01145, ecapa_loss=0.0002123, whisper_loss=0.09311, over 3849182.18 frames. ], batch size: 80, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:19:34,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=910590.0, ans=0.125 2024-08-11 04:19:49,092 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 04:19:50,336 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 04:20:03,960 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 32 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 04:20:04,674 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.47 vs. limit=15.0 2024-08-11 04:20:31,833 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.728e+01 2.968e+01 3.426e+01 6.142e+01, threshold=5.935e+01, percent-clipped=0.0 2024-08-11 04:20:41,493 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=23.18 vs. limit=22.5 2024-08-11 04:20:46,146 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4150, loss[loss=0.136, beats_loss=0.008818, ecapa_loss=0.0002083, whisper_loss=0.1251, over 16309.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01143, ecapa_loss=0.0002115, whisper_loss=0.09318, over 3824555.78 frames. ], batch size: 63, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:20:58,529 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 04:21:13,254 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 04:21:25,841 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 04:21:42,670 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.79 vs. limit=6.0 2024-08-11 04:22:02,622 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4200, loss[loss=0.1195, beats_loss=0.01174, ecapa_loss=0.000229, whisper_loss=0.1055, over 22156.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01145, ecapa_loss=0.000211, whisper_loss=0.09356, over 3862027.08 frames. ], batch size: 88, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:22:09,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=911590.0, ans=0.95 2024-08-11 04:22:19,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=911690.0, ans=0.0 2024-08-11 04:22:25,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=911690.0, ans=0.0 2024-08-11 04:22:37,920 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 04:22:41,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=911790.0, ans=0.0 2024-08-11 04:22:52,589 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 04:22:58,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=911890.0, ans=0.05 2024-08-11 04:23:00,487 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 04:23:01,982 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.712e+01 3.098e+01 3.462e+01 7.406e+01, threshold=6.196e+01, percent-clipped=1.0 2024-08-11 04:23:06,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=911990.0, ans=0.2 2024-08-11 04:23:08,925 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:23:13,820 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4250, loss[loss=0.09168, beats_loss=0.01317, ecapa_loss=0.0002397, whisper_loss=0.07611, over 20564.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01149, ecapa_loss=0.00021, whisper_loss=0.09302, over 3845068.98 frames. ], batch size: 89, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:23:19,735 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 17 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 04:23:25,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=912090.0, ans=0.125 2024-08-11 04:23:42,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=912290.0, ans=0.125 2024-08-11 04:23:53,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=912290.0, ans=0.0 2024-08-11 04:24:02,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=912390.0, ans=0.1 2024-08-11 04:24:22,541 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4300, loss[loss=0.1123, beats_loss=0.01144, ecapa_loss=0.0002134, whisper_loss=0.09876, over 22150.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01147, ecapa_loss=0.0002088, whisper_loss=0.09285, over 3857688.82 frames. ], batch size: 90, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:24:24,044 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 04:24:24,589 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-08-11 04:24:26,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=912590.0, ans=0.125 2024-08-11 04:24:28,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=912590.0, ans=0.0 2024-08-11 04:24:28,604 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=15.0 2024-08-11 04:24:29,063 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-11 04:24:38,172 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 04:24:43,633 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 04:25:01,597 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 04:25:08,561 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 04:25:11,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=912890.0, ans=0.125 2024-08-11 04:25:16,153 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.652e+01 2.970e+01 3.355e+01 6.636e+01, threshold=5.939e+01, percent-clipped=1.0 2024-08-11 04:25:17,085 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.20 vs. limit=22.5 2024-08-11 04:25:18,906 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 04:25:21,495 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 04:25:25,635 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 12 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 04:25:28,349 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4350, loss[loss=0.1205, beats_loss=0.00858, ecapa_loss=0.0002444, whisper_loss=0.1095, over 14389.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0114, ecapa_loss=0.0002086, whisper_loss=0.09293, over 3841266.74 frames. ], batch size: 56, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:25:34,081 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 04:25:39,657 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2024-08-11 04:25:45,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=913190.0, ans=0.0 2024-08-11 04:25:53,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=913290.0, ans=0.1 2024-08-11 04:25:57,691 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-08-11 04:25:58,458 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-11 04:26:09,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=15.0 2024-08-11 04:26:23,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=913490.0, ans=0.1 2024-08-11 04:26:24,510 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=15.0 2024-08-11 04:26:34,158 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4400, loss[loss=0.09907, beats_loss=0.01079, ecapa_loss=0.00021, whisper_loss=0.08618, over 22838.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01136, ecapa_loss=0.0002104, whisper_loss=0.09313, over 3833811.38 frames. ], batch size: 93, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:26:40,856 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 34 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 04:27:02,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=913790.0, ans=0.2 2024-08-11 04:27:11,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=913790.0, ans=0.0 2024-08-11 04:27:13,630 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 04:27:27,847 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.550e+01 2.848e+01 3.646e+01 5.843e+01, threshold=5.697e+01, percent-clipped=0.0 2024-08-11 04:27:38,517 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 04:27:39,576 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4450, loss[loss=0.1115, beats_loss=0.01173, ecapa_loss=0.0002259, whisper_loss=0.09748, over 21226.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01134, ecapa_loss=0.0002098, whisper_loss=0.09375, over 3855363.31 frames. ], batch size: 87, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:28:05,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=914290.0, ans=0.125 2024-08-11 04:28:36,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=914490.0, ans=0.0 2024-08-11 04:28:45,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=914490.0, ans=0.0 2024-08-11 04:28:51,471 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4500, loss[loss=0.1089, beats_loss=0.01202, ecapa_loss=0.0002476, whisper_loss=0.09437, over 21654.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01144, ecapa_loss=0.00021, whisper_loss=0.09296, over 3862591.25 frames. ], batch size: 93, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:28:55,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=914590.0, ans=0.0 2024-08-11 04:29:00,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=914590.0, ans=0.125 2024-08-11 04:29:42,903 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-11 04:29:47,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.125e+01 2.660e+01 3.113e+01 3.675e+01 6.136e+01, threshold=6.226e+01, percent-clipped=1.0 2024-08-11 04:29:49,634 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 04:29:58,981 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 04:29:59,997 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4550, loss[loss=0.1089, beats_loss=0.01269, ecapa_loss=0.0001984, whisper_loss=0.09427, over 19883.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01138, ecapa_loss=0.0002114, whisper_loss=0.0939, over 3882576.55 frames. ], batch size: 81, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:30:01,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=915090.0, ans=0.025 2024-08-11 04:30:11,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=915090.0, ans=0.0 2024-08-11 04:30:17,486 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 04:30:20,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=915190.0, ans=0.07 2024-08-11 04:30:33,102 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.048e-02 2024-08-11 04:30:38,431 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 04:30:41,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=915390.0, ans=0.07 2024-08-11 04:30:43,923 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 17 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-11 04:30:45,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=915390.0, ans=0.1 2024-08-11 04:30:50,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=915390.0, ans=0.1 2024-08-11 04:30:56,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=915490.0, ans=0.125 2024-08-11 04:31:01,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=915490.0, ans=0.2 2024-08-11 04:31:03,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=915490.0, ans=0.035 2024-08-11 04:31:05,923 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4600, loss[loss=0.1161, beats_loss=0.00827, ecapa_loss=0.0002825, whisper_loss=0.105, over 20506.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01136, ecapa_loss=0.0002118, whisper_loss=0.09376, over 3895750.38 frames. ], batch size: 81, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:31:07,744 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-08-11 04:31:11,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=915590.0, ans=0.125 2024-08-11 04:31:16,842 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 22 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-11 04:31:43,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=915890.0, ans=0.125 2024-08-11 04:31:45,029 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:31:59,072 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.820e+01 3.108e+01 3.626e+01 5.972e+01, threshold=6.216e+01, percent-clipped=0.0 2024-08-11 04:32:05,023 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.766e+00 2024-08-11 04:32:06,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=915990.0, ans=0.0 2024-08-11 04:32:11,023 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4650, loss[loss=0.1101, beats_loss=0.01178, ecapa_loss=0.0001938, whisper_loss=0.09635, over 19917.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01144, ecapa_loss=0.0002094, whisper_loss=0.09384, over 3888879.46 frames. ], batch size: 78, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:32:11,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=916090.0, ans=0.0 2024-08-11 04:32:34,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=916190.0, ans=0.0 2024-08-11 04:32:47,362 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 15 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-11 04:32:50,291 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:32:59,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=916390.0, ans=0.125 2024-08-11 04:33:08,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=916490.0, ans=0.125 2024-08-11 04:33:12,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=916490.0, ans=0.2 2024-08-11 04:33:17,521 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4700, loss[loss=0.1397, beats_loss=0.009218, ecapa_loss=0.0002036, whisper_loss=0.1284, over 23986.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01144, ecapa_loss=0.0002087, whisper_loss=0.09443, over 3908744.41 frames. ], batch size: 90, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:33:23,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=916590.0, ans=0.0 2024-08-11 04:33:40,466 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-11 04:33:40,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=916690.0, ans=0.1 2024-08-11 04:33:41,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=916690.0, ans=0.125 2024-08-11 04:33:49,719 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 04:33:52,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=916790.0, ans=0.1 2024-08-11 04:33:55,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=916790.0, ans=0.1 2024-08-11 04:34:02,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=916890.0, ans=0.125 2024-08-11 04:34:07,824 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:34:11,141 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 04:34:12,122 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.755e+01 3.145e+01 3.501e+01 4.476e+01, threshold=6.290e+01, percent-clipped=0.0 2024-08-11 04:34:22,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=917090.0, ans=0.0 2024-08-11 04:34:23,820 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4750, loss[loss=0.084, beats_loss=0.01055, ecapa_loss=0.0001873, whisper_loss=0.07157, over 15094.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01144, ecapa_loss=0.0002063, whisper_loss=0.09429, over 3889493.91 frames. ], batch size: 58, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:34:45,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=917190.0, ans=0.0 2024-08-11 04:35:03,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=917390.0, ans=0.2 2024-08-11 04:35:03,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=917390.0, ans=0.1 2024-08-11 04:35:21,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=917490.0, ans=0.125 2024-08-11 04:35:28,814 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4800, loss[loss=0.09552, beats_loss=0.01336, ecapa_loss=0.0002161, whisper_loss=0.08001, over 14676.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0115, ecapa_loss=0.0002072, whisper_loss=0.09382, over 3904603.58 frames. ], batch size: 59, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:35:34,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=917590.0, ans=0.1 2024-08-11 04:35:58,836 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 04:36:05,471 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 04:36:07,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=917890.0, ans=0.035 2024-08-11 04:36:09,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=917890.0, ans=0.1 2024-08-11 04:36:10,589 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 04:36:10,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=917890.0, ans=0.125 2024-08-11 04:36:22,203 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.878e+01 3.268e+01 3.983e+01 7.610e+01, threshold=6.536e+01, percent-clipped=1.0 2024-08-11 04:36:34,023 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4850, loss[loss=0.09858, beats_loss=0.0119, ecapa_loss=0.0002022, whisper_loss=0.08466, over 22348.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01156, ecapa_loss=0.0002083, whisper_loss=0.09326, over 3928249.21 frames. ], batch size: 92, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:36:38,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=918090.0, ans=0.0 2024-08-11 04:36:51,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=918190.0, ans=0.2 2024-08-11 04:36:52,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=918190.0, ans=0.2 2024-08-11 04:36:58,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=918190.0, ans=0.125 2024-08-11 04:37:08,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=918290.0, ans=0.125 2024-08-11 04:37:11,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=918390.0, ans=0.0 2024-08-11 04:37:21,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=918390.0, ans=0.125 2024-08-11 04:37:37,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=918590.0, ans=0.0 2024-08-11 04:37:38,783 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4900, loss[loss=0.109, beats_loss=0.01297, ecapa_loss=0.0001842, whisper_loss=0.09415, over 22668.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01153, ecapa_loss=0.0002095, whisper_loss=0.09319, over 3898416.38 frames. ], batch size: 90, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:37:43,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=918590.0, ans=0.125 2024-08-11 04:37:49,048 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 24 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-11 04:38:05,460 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.74 vs. limit=10.0 2024-08-11 04:38:09,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.97 vs. limit=22.5 2024-08-11 04:38:15,841 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.07 vs. limit=15.0 2024-08-11 04:38:31,843 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.611e+01 2.961e+01 3.443e+01 6.053e+01, threshold=5.922e+01, percent-clipped=0.0 2024-08-11 04:38:37,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=918990.0, ans=0.1 2024-08-11 04:38:37,995 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2024-08-11 04:38:43,859 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 4950, loss[loss=0.08728, beats_loss=0.01275, ecapa_loss=0.000183, whisper_loss=0.07271, over 16449.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01156, ecapa_loss=0.0002085, whisper_loss=0.09321, over 3855380.44 frames. ], batch size: 63, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:38:46,499 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 19 from LS+wenet, 32 from Vox, 41 fro AS 2024-08-11 04:38:50,427 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-11 04:38:52,158 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.26 vs. limit=15.0 2024-08-11 04:38:53,095 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 04:38:55,722 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-11 04:38:56,668 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.25 vs. limit=10.0 2024-08-11 04:39:09,066 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 04:39:14,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=919290.0, ans=0.2 2024-08-11 04:39:20,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=919290.0, ans=0.0 2024-08-11 04:39:33,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2024-08-11 04:39:53,159 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5000, loss[loss=0.1357, beats_loss=0.009743, ecapa_loss=0.0002283, whisper_loss=0.1237, over 17791.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01155, ecapa_loss=0.0002093, whisper_loss=0.09306, over 3836926.04 frames. ], batch size: 70, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:39:57,436 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 04:40:31,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=919790.0, ans=0.04949747468305833 2024-08-11 04:40:34,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=919890.0, ans=0.125 2024-08-11 04:40:47,848 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 04:40:54,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.724e+01 2.983e+01 3.443e+01 5.585e+01, threshold=5.966e+01, percent-clipped=0.0 2024-08-11 04:41:06,929 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5050, loss[loss=0.1055, beats_loss=0.01157, ecapa_loss=0.0002135, whisper_loss=0.09182, over 22147.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01159, ecapa_loss=0.0002089, whisper_loss=0.09311, over 3864606.74 frames. ], batch size: 88, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:41:07,626 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2024-08-11 04:41:11,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=920090.0, ans=0.0 2024-08-11 04:41:16,401 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-11 04:41:22,455 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2024-08-11 04:41:28,448 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 04:41:29,708 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 04:41:37,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=920290.0, ans=0.125 2024-08-11 04:41:45,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=920290.0, ans=0.2 2024-08-11 04:41:48,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=920290.0, ans=0.1 2024-08-11 04:41:57,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=920390.0, ans=0.1 2024-08-11 04:41:58,154 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 04:42:10,013 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 04:42:10,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=920490.0, ans=0.04949747468305833 2024-08-11 04:42:17,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=920590.0, ans=0.125 2024-08-11 04:42:18,639 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5100, loss[loss=0.07133, beats_loss=0.01156, ecapa_loss=0.0002308, whisper_loss=0.05746, over 14221.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01156, ecapa_loss=0.00021, whisper_loss=0.09336, over 3857478.81 frames. ], batch size: 57, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:42:30,079 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 04:42:50,856 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.914e-03 2024-08-11 04:42:57,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-08-11 04:43:00,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=920890.0, ans=0.1 2024-08-11 04:43:10,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=920890.0, ans=0.1 2024-08-11 04:43:17,412 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.667e+01 3.175e+01 3.575e+01 5.874e+01, threshold=6.350e+01, percent-clipped=0.0 2024-08-11 04:43:28,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=920990.0, ans=0.0 2024-08-11 04:43:31,558 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5150, loss[loss=0.1303, beats_loss=0.01016, ecapa_loss=0.0001859, whisper_loss=0.1182, over 23834.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01159, ecapa_loss=0.0002089, whisper_loss=0.09358, over 3871461.83 frames. ], batch size: 91, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:43:39,995 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 04:43:58,010 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 04:43:59,701 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 04:44:16,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=921390.0, ans=0.125 2024-08-11 04:44:17,665 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=12.0 2024-08-11 04:44:23,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=921390.0, ans=0.2 2024-08-11 04:44:27,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=921390.0, ans=0.125 2024-08-11 04:44:27,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=921390.0, ans=0.125 2024-08-11 04:44:34,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=921490.0, ans=0.0 2024-08-11 04:44:40,085 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 04:44:45,519 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5200, loss[loss=0.105, beats_loss=0.01017, ecapa_loss=0.0002293, whisper_loss=0.09252, over 12495.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01162, ecapa_loss=0.0002076, whisper_loss=0.09285, over 3830337.63 frames. ], batch size: 53, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:44:49,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=921590.0, ans=0.025 2024-08-11 04:44:50,786 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.04 vs. limit=22.5 2024-08-11 04:44:53,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=921590.0, ans=0.0 2024-08-11 04:45:23,840 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 25 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 04:45:37,112 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 04:45:42,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=921890.0, ans=0.5 2024-08-11 04:45:47,372 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.623e+01 2.992e+01 3.438e+01 5.362e+01, threshold=5.985e+01, percent-clipped=0.0 2024-08-11 04:46:00,825 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5250, loss[loss=0.1065, beats_loss=0.01292, ecapa_loss=0.0001836, whisper_loss=0.09177, over 15594.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01156, ecapa_loss=0.0002077, whisper_loss=0.09272, over 3796819.06 frames. ], batch size: 61, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:46:09,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=922090.0, ans=0.1 2024-08-11 04:46:17,161 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-11 04:46:19,936 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-11 04:46:53,789 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 04:47:05,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=922490.0, ans=0.125 2024-08-11 04:47:11,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=922490.0, ans=0.0 2024-08-11 04:47:13,028 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5300, loss[loss=0.1072, beats_loss=0.01277, ecapa_loss=0.0001666, whisper_loss=0.09277, over 19735.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01142, ecapa_loss=0.0002083, whisper_loss=0.09433, over 3835544.54 frames. ], batch size: 75, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:47:24,576 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-08-11 04:47:26,629 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 31 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 04:47:26,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=922690.0, ans=0.035 2024-08-11 04:47:29,926 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-11 04:47:44,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=922790.0, ans=0.0 2024-08-11 04:48:01,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=922890.0, ans=0.125 2024-08-11 04:48:07,965 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-11 04:48:11,221 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.730e+01 3.116e+01 3.540e+01 5.766e+01, threshold=6.232e+01, percent-clipped=0.0 2024-08-11 04:48:20,438 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 9 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 04:48:22,383 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2024-08-11 04:48:24,740 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5350, loss[loss=0.1079, beats_loss=0.008855, ecapa_loss=0.0002294, whisper_loss=0.09678, over 18652.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01146, ecapa_loss=0.0002069, whisper_loss=0.09366, over 3806913.17 frames. ], batch size: 76, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:48:39,657 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2024-08-11 04:48:49,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=923190.0, ans=0.0 2024-08-11 04:49:00,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=923290.0, ans=0.035 2024-08-11 04:49:01,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2024-08-11 04:49:02,106 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 04:49:03,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=923290.0, ans=0.2 2024-08-11 04:49:09,357 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 04:49:16,179 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 04:49:30,737 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 04:49:34,658 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-11 04:49:36,459 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5400, loss[loss=0.1001, beats_loss=0.01402, ecapa_loss=0.0002265, whisper_loss=0.0838, over 18689.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01143, ecapa_loss=0.0002079, whisper_loss=0.09358, over 3807068.71 frames. ], batch size: 78, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:49:39,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=923590.0, ans=0.1 2024-08-11 04:49:40,814 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-11 04:49:48,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=923690.0, ans=0.1 2024-08-11 04:49:51,197 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 21 from LS+wenet, 30 from Vox, 44 fro AS 2024-08-11 04:49:54,075 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-11 04:49:59,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=923690.0, ans=0.125 2024-08-11 04:50:03,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=923790.0, ans=0.125 2024-08-11 04:50:15,499 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 04:50:28,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=923890.0, ans=0.125 2024-08-11 04:50:31,602 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.648e+01 2.918e+01 3.540e+01 6.193e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-11 04:50:32,305 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.55 vs. limit=22.5 2024-08-11 04:50:38,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=923990.0, ans=0.1 2024-08-11 04:50:39,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=923990.0, ans=0.0 2024-08-11 04:50:43,316 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5450, loss[loss=0.1197, beats_loss=0.0116, ecapa_loss=0.0002019, whisper_loss=0.1061, over 23157.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01149, ecapa_loss=0.0002076, whisper_loss=0.09375, over 3858849.34 frames. ], batch size: 91, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:50:46,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=924090.0, ans=0.09899494936611666 2024-08-11 04:50:49,954 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 04:50:50,439 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-11 04:51:01,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=924190.0, ans=0.0 2024-08-11 04:51:10,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2024-08-11 04:51:14,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=924290.0, ans=0.0 2024-08-11 04:51:16,962 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 04:51:35,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=924390.0, ans=0.125 2024-08-11 04:51:37,558 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.70 vs. limit=22.5 2024-08-11 04:51:50,722 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5500, loss[loss=0.09917, beats_loss=0.01488, ecapa_loss=0.0001813, whisper_loss=0.08248, over 16806.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01149, ecapa_loss=0.0002069, whisper_loss=0.09427, over 3879838.25 frames. ], batch size: 68, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:51:51,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=924590.0, ans=0.125 2024-08-11 04:51:56,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=924590.0, ans=0.125 2024-08-11 04:51:59,247 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=22.5 2024-08-11 04:52:18,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=924790.0, ans=0.125 2024-08-11 04:52:20,290 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=15.0 2024-08-11 04:52:23,627 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 04:52:34,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=924890.0, ans=0.0 2024-08-11 04:52:44,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.644e+01 3.103e+01 3.543e+01 6.260e+01, threshold=6.206e+01, percent-clipped=1.0 2024-08-11 04:52:56,053 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5550, loss[loss=0.09539, beats_loss=0.01099, ecapa_loss=0.0001817, whisper_loss=0.08259, over 19144.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0114, ecapa_loss=0.0002081, whisper_loss=0.09392, over 3892620.92 frames. ], batch size: 73, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:52:56,267 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-11 04:52:58,043 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2024-08-11 04:53:00,687 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.166e-01 2024-08-11 04:53:08,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=925190.0, ans=0.125 2024-08-11 04:53:20,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-08-11 04:53:22,650 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 04:53:34,877 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.88 vs. limit=6.0 2024-08-11 04:53:39,288 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 04:53:46,228 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.53 vs. limit=6.0 2024-08-11 04:53:46,257 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.52 vs. limit=10.0 2024-08-11 04:53:51,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=925490.0, ans=0.1 2024-08-11 04:54:01,542 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5600, loss[loss=0.1053, beats_loss=0.01215, ecapa_loss=0.0002165, whisper_loss=0.09094, over 20918.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01151, ecapa_loss=0.000206, whisper_loss=0.0939, over 3896108.18 frames. ], batch size: 88, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:54:04,285 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-11 04:54:09,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=925590.0, ans=0.125 2024-08-11 04:54:09,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=925590.0, ans=0.125 2024-08-11 04:54:13,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=925690.0, ans=0.2 2024-08-11 04:54:14,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=925690.0, ans=0.2 2024-08-11 04:54:17,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=925690.0, ans=0.05 2024-08-11 04:54:29,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=925790.0, ans=0.125 2024-08-11 04:54:42,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=925890.0, ans=0.125 2024-08-11 04:54:48,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=925890.0, ans=0.1 2024-08-11 04:54:51,194 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 04:54:51,340 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:54:54,722 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 2.711e+01 3.123e+01 3.568e+01 9.227e+01, threshold=6.245e+01, percent-clipped=1.0 2024-08-11 04:54:59,145 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.75 vs. limit=15.0 2024-08-11 04:55:05,879 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5650, loss[loss=0.08477, beats_loss=0.01411, ecapa_loss=0.0002133, whisper_loss=0.06853, over 15881.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01153, ecapa_loss=0.0002071, whisper_loss=0.09378, over 3924697.33 frames. ], batch size: 62, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:55:11,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=926090.0, ans=0.125 2024-08-11 04:55:12,540 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 04:55:29,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=926190.0, ans=0.1 2024-08-11 04:55:30,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=926290.0, ans=0.2 2024-08-11 04:55:42,358 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 04:56:03,561 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 04:56:06,095 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 04:56:06,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=926490.0, ans=0.125 2024-08-11 04:56:07,324 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 04:56:10,941 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5700, loss[loss=0.119, beats_loss=0.01013, ecapa_loss=0.0001961, whisper_loss=0.1069, over 20978.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01158, ecapa_loss=0.0002073, whisper_loss=0.09383, over 3945032.44 frames. ], batch size: 82, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:56:11,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=926590.0, ans=0.125 2024-08-11 04:56:12,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=926590.0, ans=0.0 2024-08-11 04:56:20,634 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-08-11 04:56:31,755 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 04:56:35,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.85 vs. limit=12.0 2024-08-11 04:56:49,793 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 04:56:59,246 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:57:01,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=926990.0, ans=0.0 2024-08-11 04:57:03,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.796e+01 3.057e+01 3.549e+01 5.833e+01, threshold=6.113e+01, percent-clipped=0.0 2024-08-11 04:57:07,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=926990.0, ans=22.5 2024-08-11 04:57:15,660 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5750, loss[loss=0.09305, beats_loss=0.01396, ecapa_loss=0.0001712, whisper_loss=0.07738, over 17539.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01156, ecapa_loss=0.0002069, whisper_loss=0.09434, over 3935049.19 frames. ], batch size: 70, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:57:17,391 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:57:31,323 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 32 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-11 04:57:48,180 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2024-08-11 04:57:50,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2024-08-11 04:57:59,923 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=12.0 2024-08-11 04:58:05,815 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 04:58:07,706 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-11 04:58:21,383 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5800, loss[loss=0.1031, beats_loss=0.01313, ecapa_loss=0.0001795, whisper_loss=0.08818, over 23282.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01159, ecapa_loss=0.0002074, whisper_loss=0.09349, over 3919424.81 frames. ], batch size: 92, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:58:24,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=927590.0, ans=0.0 2024-08-11 04:58:35,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=927690.0, ans=0.125 2024-08-11 04:58:49,863 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 04:58:57,137 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=15.0 2024-08-11 04:59:01,851 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 04:59:06,675 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 04:59:14,147 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.689e+01 2.933e+01 3.272e+01 5.873e+01, threshold=5.865e+01, percent-clipped=0.0 2024-08-11 04:59:19,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=927990.0, ans=0.125 2024-08-11 04:59:25,964 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5850, loss[loss=0.1083, beats_loss=0.01287, ecapa_loss=0.0002084, whisper_loss=0.09335, over 22821.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01158, ecapa_loss=0.0002077, whisper_loss=0.09345, over 3924431.65 frames. ], batch size: 94, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:59:31,003 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 04:59:31,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=928090.0, ans=10.0 2024-08-11 04:59:34,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=928090.0, ans=0.0 2024-08-11 04:59:40,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=928190.0, ans=0.2 2024-08-11 04:59:45,032 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=15.0 2024-08-11 04:59:48,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=928190.0, ans=0.0 2024-08-11 04:59:50,528 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 04:59:54,251 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 05:00:06,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=928390.0, ans=0.0 2024-08-11 05:00:21,370 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=12.0 2024-08-11 05:00:30,683 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5900, loss[loss=0.1131, beats_loss=0.01067, ecapa_loss=0.0002256, whisper_loss=0.1002, over 23487.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01157, ecapa_loss=0.0002068, whisper_loss=0.09371, over 3909990.86 frames. ], batch size: 93, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:00:35,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=928590.0, ans=0.1 2024-08-11 05:00:51,760 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-11 05:00:57,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.70 vs. limit=22.5 2024-08-11 05:00:58,762 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 05:01:02,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=928790.0, ans=0.125 2024-08-11 05:01:08,351 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2024-08-11 05:01:24,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.592e+01 2.867e+01 3.350e+01 5.876e+01, threshold=5.735e+01, percent-clipped=1.0 2024-08-11 05:01:31,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=928990.0, ans=0.1 2024-08-11 05:01:32,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=928990.0, ans=0.0 2024-08-11 05:01:36,225 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 5950, loss[loss=0.1054, beats_loss=0.01035, ecapa_loss=0.0002128, whisper_loss=0.09289, over 14659.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01164, ecapa_loss=0.0002052, whisper_loss=0.09334, over 3905875.16 frames. ], batch size: 56, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:01:49,293 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 05:02:04,921 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 33 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 05:02:09,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=929290.0, ans=0.0 2024-08-11 05:02:11,803 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 05:02:41,665 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6000, loss[loss=0.1172, beats_loss=0.01144, ecapa_loss=0.0002028, whisper_loss=0.1037, over 14989.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01154, ecapa_loss=0.0002065, whisper_loss=0.09393, over 3874044.75 frames. ], batch size: 59, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:02:41,666 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 05:03:05,335 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8217, 1.7026, 3.4307, 3.2329], device='cuda:1') 2024-08-11 05:03:21,189 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on ASR_libri: loss=0.2594, beats_loss=0, ecapa_loss=0.0006753, whisper_loss=0.2527, over 922467.00 frames. 2024-08-11 05:03:38,314 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on SV_voxceleb1: loss=0.005594, beats_loss=0, ecapa_loss=0.0005594, whisper_loss=0, over 939242.00 frames. 2024-08-11 05:04:19,380 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4730, 2.7606, 2.0048, 3.0389], device='cuda:1') 2024-08-11 05:05:33,471 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on AT_audioset: loss=0.0256, beats_loss=0.0256, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 05:05:33,475 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 05:05:34,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=15.0 2024-08-11 05:05:44,910 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 05:05:49,354 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2024-08-11 05:06:15,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=929890.0, ans=0.1 2024-08-11 05:06:16,524 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 05:06:27,181 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.542e+01 2.913e+01 3.356e+01 5.863e+01, threshold=5.826e+01, percent-clipped=1.0 2024-08-11 05:06:32,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=929990.0, ans=0.125 2024-08-11 05:06:38,731 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6050, loss[loss=0.1059, beats_loss=0.01062, ecapa_loss=0.0002801, whisper_loss=0.09246, over 20470.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01159, ecapa_loss=0.000206, whisper_loss=0.09321, over 3884862.51 frames. ], batch size: 86, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:06:49,280 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 05:06:55,615 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 05:06:58,430 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 05:07:03,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=930190.0, ans=10.0 2024-08-11 05:07:08,917 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 05:07:22,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=930390.0, ans=0.0 2024-08-11 05:07:28,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=930390.0, ans=0.0 2024-08-11 05:07:43,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=930590.0, ans=0.125 2024-08-11 05:07:43,947 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6100, loss[loss=0.1134, beats_loss=0.009987, ecapa_loss=0.0002192, whisper_loss=0.1012, over 18398.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01161, ecapa_loss=0.0002077, whisper_loss=0.0924, over 3891097.99 frames. ], batch size: 72, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:07:51,421 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 05:08:06,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=930690.0, ans=0.0 2024-08-11 05:08:37,026 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.611e+01 2.902e+01 3.349e+01 2.714e+02, threshold=5.803e+01, percent-clipped=1.0 2024-08-11 05:08:48,989 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6150, loss[loss=0.1075, beats_loss=0.01088, ecapa_loss=0.0001717, whisper_loss=0.09494, over 21289.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01157, ecapa_loss=0.000208, whisper_loss=0.0929, over 3896490.79 frames. ], batch size: 83, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:09:03,731 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 05:09:15,667 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 32 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 05:09:30,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=931390.0, ans=0.125 2024-08-11 05:09:53,463 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 05:09:54,747 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6200, loss[loss=0.1188, beats_loss=0.01118, ecapa_loss=0.0002118, whisper_loss=0.1055, over 22003.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01149, ecapa_loss=0.0002085, whisper_loss=0.09356, over 3897956.89 frames. ], batch size: 89, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:10:07,543 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 27 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 05:10:14,238 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 05:10:27,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=931790.0, ans=0.0 2024-08-11 05:10:31,364 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 05:10:38,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=931890.0, ans=0.2 2024-08-11 05:10:48,196 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.318e+01 2.725e+01 3.050e+01 3.372e+01 5.411e+01, threshold=6.100e+01, percent-clipped=0.0 2024-08-11 05:10:50,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=931990.0, ans=0.2 2024-08-11 05:10:50,559 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=12.0 2024-08-11 05:11:00,386 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6250, loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.0001947, whisper_loss=0.09007, over 19916.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01146, ecapa_loss=0.0002106, whisper_loss=0.09354, over 3898270.88 frames. ], batch size: 78, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:11:12,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=932190.0, ans=0.125 2024-08-11 05:11:16,243 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 13 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 05:11:24,011 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 05:11:24,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=932190.0, ans=0.125 2024-08-11 05:11:38,502 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 05:11:45,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=932390.0, ans=0.0 2024-08-11 05:11:45,621 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.39 vs. limit=10.0 2024-08-11 05:11:46,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=932390.0, ans=0.125 2024-08-11 05:11:55,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=932490.0, ans=0.125 2024-08-11 05:11:59,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=932490.0, ans=0.1 2024-08-11 05:12:05,548 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6300, loss[loss=0.1217, beats_loss=0.009007, ecapa_loss=0.0001869, whisper_loss=0.1108, over 18652.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01145, ecapa_loss=0.0002097, whisper_loss=0.09385, over 3898049.76 frames. ], batch size: 70, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:12:08,871 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.62 vs. limit=15.0 2024-08-11 05:12:24,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=932690.0, ans=0.1 2024-08-11 05:12:25,789 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 05:12:37,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=932790.0, ans=0.07 2024-08-11 05:12:53,095 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 05:12:59,665 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.671e+01 3.003e+01 3.406e+01 5.856e+01, threshold=6.007e+01, percent-clipped=0.0 2024-08-11 05:13:11,572 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6350, loss[loss=0.1133, beats_loss=0.01424, ecapa_loss=0.0001918, whisper_loss=0.09709, over 19218.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01151, ecapa_loss=0.0002097, whisper_loss=0.09346, over 3895492.76 frames. ], batch size: 79, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:13:21,448 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 30 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 05:13:25,487 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-11 05:13:52,538 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.19 vs. limit=22.5 2024-08-11 05:13:54,804 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.213e+00 2024-08-11 05:14:02,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=933390.0, ans=0.125 2024-08-11 05:14:07,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=933490.0, ans=0.125 2024-08-11 05:14:12,653 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 05:14:15,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=933490.0, ans=0.1 2024-08-11 05:14:16,671 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-11 05:14:16,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=933490.0, ans=0.0 2024-08-11 05:14:21,233 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6400, loss[loss=0.0786, beats_loss=0.0112, ecapa_loss=0.0002702, whisper_loss=0.0647, over 18403.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01142, ecapa_loss=0.0002096, whisper_loss=0.09463, over 3912066.18 frames. ], batch size: 81, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:14:26,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=933590.0, ans=0.1 2024-08-11 05:14:33,555 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 05:14:45,758 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 05:14:48,867 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 05:14:53,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=933790.0, ans=0.04949747468305833 2024-08-11 05:15:17,308 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.766e+01 3.115e+01 3.539e+01 7.313e+01, threshold=6.229e+01, percent-clipped=3.0 2024-08-11 05:15:21,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=933990.0, ans=0.025 2024-08-11 05:15:29,558 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6450, loss[loss=0.1177, beats_loss=0.009778, ecapa_loss=0.0002242, whisper_loss=0.1057, over 18199.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01155, ecapa_loss=0.0002085, whisper_loss=0.09376, over 3911633.02 frames. ], batch size: 72, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:16:14,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=934390.0, ans=0.125 2024-08-11 05:16:14,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=934390.0, ans=0.0 2024-08-11 05:16:24,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=934390.0, ans=0.0 2024-08-11 05:16:25,436 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 33 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-11 05:16:26,989 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 38 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 05:16:33,143 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 26 from LS+wenet, 7 from Vox, 26 fro AS 2024-08-11 05:16:42,890 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6500, loss[loss=0.1186, beats_loss=0.01339, ecapa_loss=0.0001546, whisper_loss=0.1037, over 23451.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01155, ecapa_loss=0.0002074, whisper_loss=0.09457, over 3923905.09 frames. ], batch size: 90, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:16:44,273 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 27 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-11 05:16:51,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=934590.0, ans=0.09899494936611666 2024-08-11 05:16:54,278 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2024-08-11 05:16:58,058 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.866e-01 2024-08-11 05:17:18,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=934790.0, ans=0.125 2024-08-11 05:17:22,767 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 05:17:29,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=934890.0, ans=0.125 2024-08-11 05:17:34,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=934890.0, ans=0.0 2024-08-11 05:17:41,384 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 05:17:42,412 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.816e+01 3.248e+01 3.661e+01 5.361e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-11 05:17:55,968 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6550, loss[loss=0.11, beats_loss=0.01174, ecapa_loss=0.0002276, whisper_loss=0.09595, over 15107.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01153, ecapa_loss=0.0002061, whisper_loss=0.09459, over 3929166.74 frames. ], batch size: 61, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:18:03,127 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.66 vs. limit=22.5 2024-08-11 05:18:14,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=935190.0, ans=0.0 2024-08-11 05:18:28,789 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 05:18:30,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=935290.0, ans=0.125 2024-08-11 05:18:49,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=935390.0, ans=0.125 2024-08-11 05:19:06,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=935490.0, ans=0.125 2024-08-11 05:19:11,346 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6600, loss[loss=0.1246, beats_loss=0.01025, ecapa_loss=0.00019, whisper_loss=0.1124, over 22644.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01141, ecapa_loss=0.0002102, whisper_loss=0.09482, over 3940218.10 frames. ], batch size: 90, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:19:11,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=935590.0, ans=0.05 2024-08-11 05:19:19,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=935590.0, ans=0.125 2024-08-11 05:19:21,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.17 vs. limit=15.0 2024-08-11 05:19:27,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=935690.0, ans=0.1 2024-08-11 05:19:29,633 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.37 vs. limit=10.0 2024-08-11 05:19:34,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=935690.0, ans=0.2 2024-08-11 05:19:34,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=935690.0, ans=0.1 2024-08-11 05:20:00,877 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=22.5 2024-08-11 05:20:01,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=935890.0, ans=0.125 2024-08-11 05:20:04,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=935890.0, ans=0.125 2024-08-11 05:20:05,892 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 05:20:11,828 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.766e+01 3.102e+01 3.582e+01 5.637e+01, threshold=6.205e+01, percent-clipped=0.0 2024-08-11 05:20:13,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=935990.0, ans=0.2 2024-08-11 05:20:25,128 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6650, loss[loss=0.1018, beats_loss=0.009803, ecapa_loss=0.0001921, whisper_loss=0.09007, over 15023.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01144, ecapa_loss=0.0002104, whisper_loss=0.09392, over 3913500.56 frames. ], batch size: 58, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:20:28,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=936090.0, ans=0.125 2024-08-11 05:20:28,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=936090.0, ans=0.5 2024-08-11 05:20:51,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=936190.0, ans=0.1 2024-08-11 05:21:21,098 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 05:21:25,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=936490.0, ans=0.125 2024-08-11 05:21:37,890 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-08-11 05:21:38,595 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 05:21:40,336 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6700, loss[loss=0.1157, beats_loss=0.01153, ecapa_loss=0.0001521, whisper_loss=0.1027, over 20035.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01139, ecapa_loss=0.0002107, whisper_loss=0.09467, over 3955188.17 frames. ], batch size: 75, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:21:43,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=936590.0, ans=0.2 2024-08-11 05:21:55,937 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 05:22:16,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=936790.0, ans=0.1 2024-08-11 05:22:23,213 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-11 05:22:27,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=936890.0, ans=0.125 2024-08-11 05:22:39,348 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.752e+01 3.187e+01 3.868e+01 6.125e+01, threshold=6.373e+01, percent-clipped=0.0 2024-08-11 05:22:50,153 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 05:22:52,981 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6750, loss[loss=0.1025, beats_loss=0.01306, ecapa_loss=0.0001897, whisper_loss=0.08751, over 20483.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01142, ecapa_loss=0.0002113, whisper_loss=0.09462, over 3935371.70 frames. ], batch size: 83, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:22:56,324 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:23:32,619 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-08-11 05:23:35,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=937390.0, ans=0.125 2024-08-11 05:23:45,540 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 10 from Vox, 42 fro AS 2024-08-11 05:23:46,835 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 05:23:47,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=937390.0, ans=0.1 2024-08-11 05:23:51,169 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 05:24:06,328 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6800, loss[loss=0.08606, beats_loss=0.01449, ecapa_loss=0.0001713, whisper_loss=0.06986, over 22346.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01147, ecapa_loss=0.0002106, whisper_loss=0.09462, over 3940064.40 frames. ], batch size: 94, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:24:09,556 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:24:23,325 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 05:24:26,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=937690.0, ans=0.1 2024-08-11 05:24:26,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=937690.0, ans=0.0 2024-08-11 05:24:34,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=937790.0, ans=22.5 2024-08-11 05:24:35,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=937790.0, ans=0.1 2024-08-11 05:24:37,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2024-08-11 05:24:39,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=937790.0, ans=0.125 2024-08-11 05:24:51,202 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 05:25:05,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.742e+01 3.088e+01 3.392e+01 5.512e+01, threshold=6.176e+01, percent-clipped=0.0 2024-08-11 05:25:18,773 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6850, loss[loss=0.1092, beats_loss=0.009154, ecapa_loss=0.0002274, whisper_loss=0.09779, over 17167.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01148, ecapa_loss=0.0002116, whisper_loss=0.09405, over 3936343.25 frames. ], batch size: 69, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:25:19,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=938090.0, ans=0.125 2024-08-11 05:25:34,217 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 05:25:35,007 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.08 vs. limit=12.0 2024-08-11 05:25:57,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=938290.0, ans=0.1 2024-08-11 05:25:58,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=938290.0, ans=0.125 2024-08-11 05:26:00,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=938290.0, ans=0.125 2024-08-11 05:26:06,349 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.03 vs. limit=22.5 2024-08-11 05:26:19,132 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 05:26:33,287 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6900, loss[loss=0.1264, beats_loss=0.009021, ecapa_loss=0.000208, whisper_loss=0.1153, over 21043.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01149, ecapa_loss=0.0002099, whisper_loss=0.0942, over 3929385.72 frames. ], batch size: 83, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:26:56,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=938690.0, ans=0.0 2024-08-11 05:27:01,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=938690.0, ans=0.125 2024-08-11 05:27:06,170 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 05:27:07,872 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-11 05:27:09,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=938790.0, ans=0.125 2024-08-11 05:27:10,668 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 38 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 05:27:15,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=938790.0, ans=0.125 2024-08-11 05:27:18,327 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 05:27:19,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=938890.0, ans=0.1 2024-08-11 05:27:34,733 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.641e+01 3.049e+01 3.440e+01 6.351e+01, threshold=6.099e+01, percent-clipped=1.0 2024-08-11 05:27:38,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=938990.0, ans=0.1 2024-08-11 05:27:42,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=938990.0, ans=0.0 2024-08-11 05:27:45,844 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.99 vs. limit=10.0 2024-08-11 05:27:48,636 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 6950, loss[loss=0.09062, beats_loss=0.01356, ecapa_loss=0.0001799, whisper_loss=0.07526, over 21916.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0115, ecapa_loss=0.0002083, whisper_loss=0.09451, over 3942665.28 frames. ], batch size: 88, lr: 9.01e-03, grad_scale: 2251799813685248.0 2024-08-11 05:28:03,116 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 05:28:18,828 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-11 05:28:20,821 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 05:28:26,461 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 05:28:32,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=939390.0, ans=0.09899494936611666 2024-08-11 05:28:56,029 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 05:28:59,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.18 vs. limit=15.0 2024-08-11 05:29:01,198 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7000, loss[loss=0.09842, beats_loss=0.01109, ecapa_loss=0.0002473, whisper_loss=0.08486, over 16697.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01151, ecapa_loss=0.0002086, whisper_loss=0.09431, over 3895787.03 frames. ], batch size: 69, lr: 9.01e-03, grad_scale: 2251799813685248.0 2024-08-11 05:29:12,917 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 05:29:14,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=939690.0, ans=0.2 2024-08-11 05:29:31,455 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-11 05:29:32,055 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2024-08-11 05:29:54,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=939890.0, ans=0.0 2024-08-11 05:29:55,940 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 05:29:59,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.694e+01 2.915e+01 3.195e+01 8.375e+01, threshold=5.830e+01, percent-clipped=1.0 2024-08-11 05:30:02,733 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 05:30:05,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=939990.0, ans=0.1 2024-08-11 05:30:11,981 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7050, loss[loss=0.1209, beats_loss=0.009774, ecapa_loss=0.0002148, whisper_loss=0.109, over 22053.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0115, ecapa_loss=0.0002072, whisper_loss=0.09425, over 3861085.64 frames. ], batch size: 88, lr: 9.01e-03, grad_scale: 4503599627370496.0 2024-08-11 05:30:27,222 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 05:30:33,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=940190.0, ans=0.125 2024-08-11 05:30:39,001 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 32 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 05:30:45,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=940290.0, ans=0.07 2024-08-11 05:30:57,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=940390.0, ans=0.125 2024-08-11 05:31:04,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=940390.0, ans=0.07 2024-08-11 05:31:10,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=940490.0, ans=0.0 2024-08-11 05:31:12,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=940490.0, ans=0.09899494936611666 2024-08-11 05:31:15,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=940490.0, ans=0.2 2024-08-11 05:31:16,889 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 05:31:22,138 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7100, loss[loss=0.09335, beats_loss=0.01045, ecapa_loss=0.0002238, whisper_loss=0.08067, over 13822.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01144, ecapa_loss=0.0002053, whisper_loss=0.09465, over 3875134.44 frames. ], batch size: 55, lr: 9.01e-03, grad_scale: 4503599627370496.0 2024-08-11 05:31:40,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=940690.0, ans=0.2 2024-08-11 05:31:54,607 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.18 vs. limit=22.5 2024-08-11 05:32:00,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=940790.0, ans=0.1 2024-08-11 05:32:06,504 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 05:32:09,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=940890.0, ans=0.1 2024-08-11 05:32:15,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=940890.0, ans=0.125 2024-08-11 05:32:15,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=940890.0, ans=0.0 2024-08-11 05:32:18,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=940990.0, ans=0.125 2024-08-11 05:32:20,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.694e+01 2.982e+01 3.283e+01 5.309e+01, threshold=5.963e+01, percent-clipped=0.0 2024-08-11 05:32:33,905 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7150, loss[loss=0.1136, beats_loss=0.01101, ecapa_loss=0.0002781, whisper_loss=0.09979, over 21915.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01154, ecapa_loss=0.0002039, whisper_loss=0.09368, over 3889345.83 frames. ], batch size: 96, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:32:52,794 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 05:33:18,391 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-11 05:33:50,112 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7200, loss[loss=0.09886, beats_loss=0.01312, ecapa_loss=0.0002225, whisper_loss=0.08352, over 17615.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01157, ecapa_loss=0.0002058, whisper_loss=0.09268, over 3896148.31 frames. ], batch size: 70, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:33:50,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=941590.0, ans=0.0 2024-08-11 05:34:26,218 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 05:34:31,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=941790.0, ans=0.025 2024-08-11 05:34:34,065 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2024-08-11 05:34:53,094 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.691e+01 3.038e+01 3.510e+01 5.388e+01, threshold=6.075e+01, percent-clipped=0.0 2024-08-11 05:35:06,352 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7250, loss[loss=0.1188, beats_loss=0.01209, ecapa_loss=0.000205, whisper_loss=0.1047, over 16465.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01164, ecapa_loss=0.0002043, whisper_loss=0.09296, over 3933414.78 frames. ], batch size: 66, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:35:07,138 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.98 vs. limit=10.0 2024-08-11 05:35:15,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=942090.0, ans=0.2 2024-08-11 05:35:17,057 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 05:35:22,042 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 05:35:29,400 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 05:35:29,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=942190.0, ans=0.125 2024-08-11 05:35:41,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=942290.0, ans=0.1 2024-08-11 05:35:51,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=942390.0, ans=0.1 2024-08-11 05:35:57,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=942390.0, ans=0.09899494936611666 2024-08-11 05:36:06,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=942390.0, ans=0.125 2024-08-11 05:36:22,739 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 05:36:24,008 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7300, loss[loss=0.1242, beats_loss=0.01126, ecapa_loss=0.0001829, whisper_loss=0.1111, over 22618.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01159, ecapa_loss=0.0002067, whisper_loss=0.09349, over 3945543.27 frames. ], batch size: 87, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:36:32,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=942590.0, ans=0.125 2024-08-11 05:36:40,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=942690.0, ans=0.125 2024-08-11 05:36:51,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=942690.0, ans=15.0 2024-08-11 05:36:56,145 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.85 vs. limit=15.0 2024-08-11 05:36:58,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=942790.0, ans=0.1 2024-08-11 05:37:00,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=942790.0, ans=0.0 2024-08-11 05:37:08,454 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2024-08-11 05:37:14,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=942890.0, ans=0.04949747468305833 2024-08-11 05:37:28,117 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.619e+01 2.865e+01 3.274e+01 5.323e+01, threshold=5.731e+01, percent-clipped=0.0 2024-08-11 05:37:28,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=942990.0, ans=15.0 2024-08-11 05:37:36,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=942990.0, ans=0.125 2024-08-11 05:37:42,667 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7350, loss[loss=0.1271, beats_loss=0.01005, ecapa_loss=0.0002158, whisper_loss=0.1149, over 22372.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01142, ecapa_loss=0.0002078, whisper_loss=0.09424, over 3925135.24 frames. ], batch size: 89, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:38:07,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=943190.0, ans=12.0 2024-08-11 05:38:11,029 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 05:38:14,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=943290.0, ans=0.1 2024-08-11 05:38:21,797 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=12.0 2024-08-11 05:38:33,394 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 05:38:34,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=943390.0, ans=0.125 2024-08-11 05:38:38,116 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 05:38:44,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=943390.0, ans=0.0 2024-08-11 05:38:47,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=943490.0, ans=0.125 2024-08-11 05:38:50,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=943490.0, ans=0.0 2024-08-11 05:38:52,494 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.97 vs. limit=22.5 2024-08-11 05:39:04,158 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7400, loss[loss=0.1123, beats_loss=0.01195, ecapa_loss=0.0002295, whisper_loss=0.09809, over 22658.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01144, ecapa_loss=0.0002076, whisper_loss=0.09394, over 3931861.66 frames. ], batch size: 95, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:39:13,438 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=12.0 2024-08-11 05:39:34,930 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-11 05:39:35,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=943690.0, ans=0.0 2024-08-11 05:40:03,543 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 05:40:12,237 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.754e+01 3.134e+01 3.578e+01 6.308e+01, threshold=6.268e+01, percent-clipped=2.0 2024-08-11 05:40:22,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=943990.0, ans=0.125 2024-08-11 05:40:27,711 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7450, loss[loss=0.08853, beats_loss=0.01274, ecapa_loss=0.000232, whisper_loss=0.07347, over 19264.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01155, ecapa_loss=0.0002072, whisper_loss=0.09345, over 3958602.78 frames. ], batch size: 85, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:40:40,125 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2024-08-11 05:41:11,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=944290.0, ans=0.1 2024-08-11 05:41:25,627 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 05:41:29,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=944390.0, ans=0.125 2024-08-11 05:41:36,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=944490.0, ans=0.125 2024-08-11 05:41:50,898 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7500, loss[loss=0.1211, beats_loss=0.01114, ecapa_loss=0.0002122, whisper_loss=0.1079, over 17792.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0115, ecapa_loss=0.0002078, whisper_loss=0.09344, over 3927370.27 frames. ], batch size: 69, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:41:58,447 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 05:42:05,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=944690.0, ans=0.0 2024-08-11 05:42:10,970 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 05:42:18,635 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 05:42:29,731 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 05:42:39,039 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 05:42:41,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=944890.0, ans=0.125 2024-08-11 05:42:54,207 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.583e+01 2.883e+01 3.295e+01 6.050e+01, threshold=5.765e+01, percent-clipped=0.0 2024-08-11 05:42:57,533 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 05:43:08,086 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7550, loss[loss=0.09674, beats_loss=0.01401, ecapa_loss=0.0001606, whisper_loss=0.08112, over 22556.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01149, ecapa_loss=0.0002088, whisper_loss=0.09349, over 3891150.63 frames. ], batch size: 89, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:43:16,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=945090.0, ans=0.125 2024-08-11 05:43:23,454 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 05:43:41,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=945290.0, ans=0.1 2024-08-11 05:43:54,161 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 05:43:57,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=945390.0, ans=0.125 2024-08-11 05:44:22,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=945490.0, ans=0.1 2024-08-11 05:44:25,158 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7600, loss[loss=0.1036, beats_loss=0.01177, ecapa_loss=0.0002136, whisper_loss=0.08973, over 17486.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01145, ecapa_loss=0.0002096, whisper_loss=0.09277, over 3866235.58 frames. ], batch size: 73, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:44:35,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=945590.0, ans=0.1 2024-08-11 05:44:47,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=945690.0, ans=0.125 2024-08-11 05:45:05,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=945790.0, ans=0.125 2024-08-11 05:45:17,380 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-08-11 05:45:21,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=945890.0, ans=0.0 2024-08-11 05:45:27,872 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.611e+01 2.976e+01 3.513e+01 5.739e+01, threshold=5.952e+01, percent-clipped=0.0 2024-08-11 05:45:28,772 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-08-11 05:45:41,038 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7650, loss[loss=0.08607, beats_loss=0.01079, ecapa_loss=0.000228, whisper_loss=0.073, over 15949.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01127, ecapa_loss=0.0002102, whisper_loss=0.09382, over 3853963.53 frames. ], batch size: 66, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:45:44,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=946090.0, ans=0.125 2024-08-11 05:45:48,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=946090.0, ans=0.2 2024-08-11 05:45:52,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=946090.0, ans=0.0 2024-08-11 05:45:52,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=946090.0, ans=0.125 2024-08-11 05:45:56,075 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-11 05:46:15,863 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 05:46:17,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=946290.0, ans=0.0 2024-08-11 05:46:49,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=946490.0, ans=0.125 2024-08-11 05:46:57,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=946590.0, ans=0.1 2024-08-11 05:46:58,653 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7700, loss[loss=0.1179, beats_loss=0.01061, ecapa_loss=0.0001941, whisper_loss=0.1054, over 23258.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01128, ecapa_loss=0.0002089, whisper_loss=0.09407, over 3892956.21 frames. ], batch size: 90, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:47:50,538 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 05:47:51,802 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 05:47:58,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=946890.0, ans=0.125 2024-08-11 05:48:03,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.753e+01 2.991e+01 3.515e+01 5.898e+01, threshold=5.981e+01, percent-clipped=0.0 2024-08-11 05:48:04,333 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.10 vs. limit=15.0 2024-08-11 05:48:13,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=946990.0, ans=0.125 2024-08-11 05:48:17,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=947090.0, ans=0.1 2024-08-11 05:48:17,936 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7750, loss[loss=0.09835, beats_loss=0.0134, ecapa_loss=0.0002068, whisper_loss=0.08287, over 21431.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01137, ecapa_loss=0.0002074, whisper_loss=0.09362, over 3889203.21 frames. ], batch size: 89, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:48:21,461 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2024-08-11 05:48:28,549 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 05:48:29,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=947090.0, ans=15.0 2024-08-11 05:48:36,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=947190.0, ans=0.125 2024-08-11 05:48:37,558 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 05:48:44,896 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 21 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-11 05:48:45,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=947190.0, ans=0.1 2024-08-11 05:48:52,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=947290.0, ans=0.125 2024-08-11 05:49:01,843 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 05:49:20,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=947490.0, ans=0.125 2024-08-11 05:49:20,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=947490.0, ans=0.125 2024-08-11 05:49:25,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=947490.0, ans=0.1 2024-08-11 05:49:36,163 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7800, loss[loss=0.09275, beats_loss=0.01461, ecapa_loss=0.0001785, whisper_loss=0.07636, over 21570.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01136, ecapa_loss=0.000207, whisper_loss=0.0937, over 3872558.71 frames. ], batch size: 88, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:49:43,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=947590.0, ans=0.2 2024-08-11 05:49:58,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=947690.0, ans=0.0 2024-08-11 05:50:11,368 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 05:50:21,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=947890.0, ans=0.0 2024-08-11 05:50:39,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.753e+01 3.128e+01 3.537e+01 5.360e+01, threshold=6.257e+01, percent-clipped=0.0 2024-08-11 05:50:49,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=947990.0, ans=0.0 2024-08-11 05:50:51,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=948090.0, ans=0.2 2024-08-11 05:50:53,098 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7850, loss[loss=0.1164, beats_loss=0.008328, ecapa_loss=0.0002552, whisper_loss=0.1055, over 16871.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01139, ecapa_loss=0.0002079, whisper_loss=0.09324, over 3888141.24 frames. ], batch size: 69, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:51:10,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=948190.0, ans=0.125 2024-08-11 05:51:12,946 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 05:51:15,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=948190.0, ans=0.125 2024-08-11 05:51:25,005 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 05:51:25,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=948290.0, ans=0.04949747468305833 2024-08-11 05:51:55,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=948490.0, ans=0.0 2024-08-11 05:52:09,687 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7900, loss[loss=0.1179, beats_loss=0.01101, ecapa_loss=0.0002279, whisper_loss=0.1046, over 19568.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01146, ecapa_loss=0.0002065, whisper_loss=0.09347, over 3871971.93 frames. ], batch size: 78, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:52:19,879 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 05:52:23,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=948590.0, ans=0.125 2024-08-11 05:52:56,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=948890.0, ans=0.0 2024-08-11 05:52:57,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=948890.0, ans=0.0 2024-08-11 05:53:02,259 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 05:53:03,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=948890.0, ans=0.125 2024-08-11 05:53:09,890 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 05:53:14,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.621e+01 3.000e+01 3.506e+01 5.251e+01, threshold=6.001e+01, percent-clipped=0.0 2024-08-11 05:53:17,958 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 05:53:29,013 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 7950, loss[loss=0.08773, beats_loss=0.01188, ecapa_loss=0.0001767, whisper_loss=0.07409, over 17230.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01146, ecapa_loss=0.0002058, whisper_loss=0.09345, over 3869011.58 frames. ], batch size: 66, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:53:41,271 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-11 05:54:25,851 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-08-11 05:54:28,023 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-11 05:54:39,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=949490.0, ans=0.125 2024-08-11 05:54:50,019 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8000, loss[loss=0.1191, beats_loss=0.008534, ecapa_loss=0.0002672, whisper_loss=0.1079, over 19362.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0114, ecapa_loss=0.0002069, whisper_loss=0.09397, over 3897458.45 frames. ], batch size: 82, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:55:12,851 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 05:55:22,857 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 16 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 05:55:26,776 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 34 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 05:55:27,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=949790.0, ans=0.1 2024-08-11 05:55:32,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=949790.0, ans=0.125 2024-08-11 05:55:34,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=949790.0, ans=0.0 2024-08-11 05:55:58,332 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.705e+01 3.037e+01 3.592e+01 7.289e+01, threshold=6.074e+01, percent-clipped=2.0 2024-08-11 05:55:59,731 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 05:56:02,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=949990.0, ans=0.125 2024-08-11 05:56:05,100 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 05:56:08,166 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 05:56:09,402 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 05:56:10,802 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8050, loss[loss=0.1029, beats_loss=0.01208, ecapa_loss=0.0001805, whisper_loss=0.08905, over 15154.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01136, ecapa_loss=0.0002075, whisper_loss=0.09435, over 3890128.65 frames. ], batch size: 59, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:56:22,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=950090.0, ans=0.125 2024-08-11 05:56:48,371 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 05:56:53,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=950290.0, ans=0.07 2024-08-11 05:56:55,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=950290.0, ans=0.125 2024-08-11 05:57:00,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=950390.0, ans=0.2 2024-08-11 05:57:03,480 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 05:57:07,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=950390.0, ans=0.1 2024-08-11 05:57:21,083 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2024-08-11 05:57:28,403 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8100, loss[loss=0.102, beats_loss=0.0115, ecapa_loss=0.0002162, whisper_loss=0.08838, over 17501.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01132, ecapa_loss=0.0002084, whisper_loss=0.09455, over 3893314.52 frames. ], batch size: 68, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:57:40,743 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 05:57:50,331 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-11 05:58:01,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=950790.0, ans=0.125 2024-08-11 05:58:08,885 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 05:58:14,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=950790.0, ans=0.2 2024-08-11 05:58:16,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=950890.0, ans=15.0 2024-08-11 05:58:27,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=950890.0, ans=0.0 2024-08-11 05:58:36,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.727e+01 3.067e+01 3.354e+01 4.801e+01, threshold=6.134e+01, percent-clipped=0.0 2024-08-11 05:58:41,516 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-11 05:58:46,687 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=12.0 2024-08-11 05:58:51,411 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8150, loss[loss=0.07825, beats_loss=0.01229, ecapa_loss=0.0001788, whisper_loss=0.06418, over 17597.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01136, ecapa_loss=0.0002089, whisper_loss=0.09385, over 3900080.67 frames. ], batch size: 70, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:58:54,822 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 13 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 05:59:04,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.85 vs. limit=12.0 2024-08-11 05:59:05,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=951090.0, ans=10.0 2024-08-11 05:59:11,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=951190.0, ans=0.1 2024-08-11 05:59:23,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=951290.0, ans=0.2 2024-08-11 05:59:31,431 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2024-08-11 05:59:35,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=951290.0, ans=0.0 2024-08-11 05:59:38,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=951390.0, ans=0.125 2024-08-11 05:59:56,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=951490.0, ans=0.125 2024-08-11 06:00:06,911 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 06:00:13,753 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8200, loss[loss=0.1051, beats_loss=0.01335, ecapa_loss=0.000173, whisper_loss=0.09004, over 23574.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01137, ecapa_loss=0.0002088, whisper_loss=0.09356, over 3917056.74 frames. ], batch size: 94, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:00:40,759 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.300e-02 2024-08-11 06:01:06,779 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 06:01:12,169 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.071e-02 2024-08-11 06:01:19,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.662e+01 3.047e+01 3.528e+01 2.595e+02, threshold=6.093e+01, percent-clipped=1.0 2024-08-11 06:01:23,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=951990.0, ans=0.125 2024-08-11 06:01:34,467 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8250, loss[loss=0.1008, beats_loss=0.01185, ecapa_loss=0.0002433, whisper_loss=0.08652, over 13139.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01134, ecapa_loss=0.000207, whisper_loss=0.09407, over 3924598.91 frames. ], batch size: 54, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:01:42,394 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-11 06:01:53,381 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.86 vs. limit=22.5 2024-08-11 06:02:03,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=952190.0, ans=0.125 2024-08-11 06:02:08,216 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.39 vs. limit=15.0 2024-08-11 06:02:09,488 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.356e+00 2024-08-11 06:02:21,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=952290.0, ans=0.0 2024-08-11 06:02:41,729 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 06:02:42,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=952490.0, ans=0.125 2024-08-11 06:02:54,270 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8300, loss[loss=0.1048, beats_loss=0.01031, ecapa_loss=0.0002318, whisper_loss=0.09219, over 20987.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01145, ecapa_loss=0.0002054, whisper_loss=0.09336, over 3929159.26 frames. ], batch size: 85, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:02:56,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=952590.0, ans=0.0 2024-08-11 06:03:13,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=952690.0, ans=0.125 2024-08-11 06:03:13,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.07 vs. limit=15.0 2024-08-11 06:03:20,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=952690.0, ans=0.1 2024-08-11 06:03:23,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=952690.0, ans=0.1 2024-08-11 06:03:28,751 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 06:03:34,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=952790.0, ans=0.125 2024-08-11 06:03:46,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=952890.0, ans=0.125 2024-08-11 06:03:49,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=952890.0, ans=0.125 2024-08-11 06:03:58,094 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.727e+01 2.981e+01 3.576e+01 6.756e+01, threshold=5.962e+01, percent-clipped=1.0 2024-08-11 06:04:00,004 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 06:04:11,038 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 06:04:12,453 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8350, loss[loss=0.111, beats_loss=0.01024, ecapa_loss=0.0002131, whisper_loss=0.09867, over 23119.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01153, ecapa_loss=0.0002062, whisper_loss=0.09322, over 3919805.32 frames. ], batch size: 92, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:04:23,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=953090.0, ans=0.125 2024-08-11 06:04:45,933 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 20 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 06:05:03,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=953390.0, ans=0.2 2024-08-11 06:05:19,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2024-08-11 06:05:33,073 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8400, loss[loss=0.08889, beats_loss=0.01212, ecapa_loss=0.0001982, whisper_loss=0.07479, over 15883.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01139, ecapa_loss=0.0002072, whisper_loss=0.09388, over 3909079.49 frames. ], batch size: 65, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:05:38,302 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-11 06:05:47,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=953690.0, ans=0.125 2024-08-11 06:06:02,004 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 06:06:03,632 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 06:06:08,933 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 06:06:11,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=953790.0, ans=0.125 2024-08-11 06:06:15,589 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 06:06:17,085 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-11 06:06:24,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=953890.0, ans=0.09899494936611666 2024-08-11 06:06:39,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=953990.0, ans=0.0 2024-08-11 06:06:40,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.819e+01 3.267e+01 3.747e+01 3.320e+02, threshold=6.533e+01, percent-clipped=4.0 2024-08-11 06:06:50,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=953990.0, ans=0.2 2024-08-11 06:06:54,885 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8450, loss[loss=0.1171, beats_loss=0.0114, ecapa_loss=0.0001922, whisper_loss=0.1037, over 23857.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01138, ecapa_loss=0.0002075, whisper_loss=0.09407, over 3888547.01 frames. ], batch size: 91, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:06:57,148 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-11 06:07:02,020 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 06:07:16,778 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.48 vs. limit=22.5 2024-08-11 06:07:18,603 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 06:07:23,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=954190.0, ans=0.125 2024-08-11 06:08:14,949 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-11 06:08:17,983 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8500, loss[loss=0.1004, beats_loss=0.0119, ecapa_loss=0.000231, whisper_loss=0.08622, over 20202.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01135, ecapa_loss=0.0002076, whisper_loss=0.09414, over 3909336.13 frames. ], batch size: 83, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:08:20,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=954590.0, ans=0.125 2024-08-11 06:08:41,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=954690.0, ans=0.1 2024-08-11 06:08:48,695 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-11 06:08:56,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=954790.0, ans=0.0 2024-08-11 06:09:05,076 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 12 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 06:09:19,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=954890.0, ans=0.125 2024-08-11 06:09:25,941 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.679e+01 3.057e+01 3.369e+01 5.558e+01, threshold=6.114e+01, percent-clipped=0.0 2024-08-11 06:09:27,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=954990.0, ans=0.2 2024-08-11 06:09:32,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=954990.0, ans=0.125 2024-08-11 06:09:39,824 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8550, loss[loss=0.1057, beats_loss=0.01245, ecapa_loss=0.0001522, whisper_loss=0.09175, over 17991.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01148, ecapa_loss=0.0002059, whisper_loss=0.09376, over 3928256.85 frames. ], batch size: 66, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:09:58,400 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2024-08-11 06:10:10,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=955190.0, ans=0.125 2024-08-11 06:10:10,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=955190.0, ans=0.125 2024-08-11 06:10:53,007 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 06:10:54,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=955490.0, ans=0.125 2024-08-11 06:10:58,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=955490.0, ans=0.125 2024-08-11 06:11:00,313 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 06:11:05,357 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8600, loss[loss=0.1151, beats_loss=0.01266, ecapa_loss=0.00018, whisper_loss=0.1006, over 21081.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01148, ecapa_loss=0.0002055, whisper_loss=0.09365, over 3905067.61 frames. ], batch size: 87, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:11:10,413 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 06:11:33,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=955690.0, ans=0.1 2024-08-11 06:11:44,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=955790.0, ans=0.05 2024-08-11 06:11:52,701 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 06:12:14,027 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.782e+01 3.171e+01 3.818e+01 6.085e+01, threshold=6.342e+01, percent-clipped=0.0 2024-08-11 06:12:24,484 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 06:12:28,997 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8650, loss[loss=0.1012, beats_loss=0.01235, ecapa_loss=0.0001961, whisper_loss=0.08693, over 18993.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01151, ecapa_loss=0.0002066, whisper_loss=0.09333, over 3919580.33 frames. ], batch size: 76, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:12:35,465 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 17 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 06:12:38,357 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=7.123e-02 2024-08-11 06:12:54,273 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2024-08-11 06:12:55,919 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 06:13:35,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=956490.0, ans=0.1 2024-08-11 06:13:41,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=956490.0, ans=0.0 2024-08-11 06:13:44,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=956490.0, ans=0.09899494936611666 2024-08-11 06:13:50,651 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=22.5 2024-08-11 06:13:52,078 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8700, loss[loss=0.09755, beats_loss=0.01309, ecapa_loss=0.0002291, whisper_loss=0.08217, over 21620.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01151, ecapa_loss=0.000207, whisper_loss=0.09341, over 3922491.14 frames. ], batch size: 91, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:14:07,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=956690.0, ans=0.125 2024-08-11 06:14:27,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=956790.0, ans=0.125 2024-08-11 06:14:30,783 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 06:14:38,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=956790.0, ans=0.125 2024-08-11 06:14:41,971 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-11 06:14:53,563 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 06:14:55,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=956990.0, ans=0.125 2024-08-11 06:14:57,440 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.743e+01 3.051e+01 3.561e+01 4.836e+01, threshold=6.102e+01, percent-clipped=0.0 2024-08-11 06:14:58,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=956990.0, ans=0.07 2024-08-11 06:15:08,329 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-11 06:15:11,979 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8750, loss[loss=0.08764, beats_loss=0.01226, ecapa_loss=0.0002293, whisper_loss=0.07308, over 20435.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0115, ecapa_loss=0.0002062, whisper_loss=0.0928, over 3892859.99 frames. ], batch size: 88, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:15:14,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=957090.0, ans=0.0 2024-08-11 06:15:39,308 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 11 from Vox, 50 fro AS 2024-08-11 06:15:54,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=957290.0, ans=0.2 2024-08-11 06:16:08,716 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-11 06:16:29,520 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8800, loss[loss=0.1099, beats_loss=0.0125, ecapa_loss=0.0001934, whisper_loss=0.09547, over 22930.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01164, ecapa_loss=0.0002048, whisper_loss=0.09249, over 3898701.49 frames. ], batch size: 92, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:16:44,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=957690.0, ans=0.125 2024-08-11 06:16:51,466 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.46 vs. limit=22.5 2024-08-11 06:17:02,271 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2024-08-11 06:17:13,096 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 06:17:33,933 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.553e+01 2.761e+01 3.256e+01 4.911e+01, threshold=5.522e+01, percent-clipped=0.0 2024-08-11 06:17:49,408 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8850, loss[loss=0.1155, beats_loss=0.0118, ecapa_loss=0.0001699, whisper_loss=0.102, over 23926.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01162, ecapa_loss=0.0002042, whisper_loss=0.09247, over 3895045.11 frames. ], batch size: 89, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:18:00,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=958090.0, ans=0.125 2024-08-11 06:18:04,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=958190.0, ans=0.0 2024-08-11 06:18:05,830 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 06:18:17,383 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 06:18:21,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=958290.0, ans=0.0 2024-08-11 06:18:32,676 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 21 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-11 06:18:37,045 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 06:18:38,881 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.37 vs. limit=6.0 2024-08-11 06:18:53,897 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-11 06:18:56,139 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 06:19:01,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=958490.0, ans=0.125 2024-08-11 06:19:04,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=958490.0, ans=0.0 2024-08-11 06:19:09,628 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-11 06:19:10,935 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8900, loss[loss=0.1115, beats_loss=0.009053, ecapa_loss=0.0002031, whisper_loss=0.1004, over 15290.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01152, ecapa_loss=0.0002045, whisper_loss=0.09255, over 3877615.48 frames. ], batch size: 56, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:19:23,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=958590.0, ans=0.0 2024-08-11 06:19:37,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=958690.0, ans=0.2 2024-08-11 06:19:58,419 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-11 06:20:13,091 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.703e+01 3.133e+01 3.628e+01 5.499e+01, threshold=6.267e+01, percent-clipped=0.0 2024-08-11 06:20:13,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=958990.0, ans=0.2 2024-08-11 06:20:15,461 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 06:20:18,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=958990.0, ans=0.0 2024-08-11 06:20:19,273 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 06:20:26,398 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 8950, loss[loss=0.08895, beats_loss=0.01218, ecapa_loss=0.000237, whisper_loss=0.0744, over 15048.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01147, ecapa_loss=0.0002043, whisper_loss=0.09341, over 3883858.61 frames. ], batch size: 63, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:20:33,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=959090.0, ans=0.125 2024-08-11 06:20:36,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=959090.0, ans=0.2 2024-08-11 06:20:43,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=959190.0, ans=0.125 2024-08-11 06:20:50,694 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-11 06:20:56,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.07 vs. limit=10.0 2024-08-11 06:21:05,764 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 06:21:09,596 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.47 vs. limit=22.5 2024-08-11 06:21:19,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-08-11 06:21:40,485 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9000, loss[loss=0.1195, beats_loss=0.008965, ecapa_loss=0.0002532, whisper_loss=0.108, over 14383.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01156, ecapa_loss=0.0002057, whisper_loss=0.09295, over 3888164.79 frames. ], batch size: 57, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:21:40,486 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 06:22:22,424 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on ASR_libri: loss=0.2572, beats_loss=0, ecapa_loss=0.0006695, whisper_loss=0.2505, over 922467.00 frames. 2024-08-11 06:22:40,870 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on SV_voxceleb1: loss=0.005671, beats_loss=0, ecapa_loss=0.0005671, whisper_loss=0, over 939242.00 frames. 2024-08-11 06:24:43,621 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on AT_audioset: loss=0.0256, beats_loss=0.0256, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 06:24:43,625 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 06:24:53,952 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 06:25:08,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=959690.0, ans=0.1 2024-08-11 06:25:35,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=959890.0, ans=0.125 2024-08-11 06:25:38,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=959890.0, ans=0.07 2024-08-11 06:25:44,782 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.51 vs. limit=22.5 2024-08-11 06:25:49,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.675e+01 2.932e+01 3.308e+01 5.321e+01, threshold=5.865e+01, percent-clipped=0.0 2024-08-11 06:26:02,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=960090.0, ans=0.125 2024-08-11 06:26:03,767 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9050, loss[loss=0.1187, beats_loss=0.01209, ecapa_loss=0.0001738, whisper_loss=0.1049, over 18259.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01152, ecapa_loss=0.0002063, whisper_loss=0.09351, over 3856717.04 frames. ], batch size: 70, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:26:13,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2024-08-11 06:26:21,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=960190.0, ans=0.0 2024-08-11 06:26:43,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=960290.0, ans=0.0 2024-08-11 06:26:43,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=960290.0, ans=0.125 2024-08-11 06:27:05,684 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 27 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 06:27:08,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=960390.0, ans=0.1 2024-08-11 06:27:11,076 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-11 06:27:19,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=960490.0, ans=0.125 2024-08-11 06:27:20,507 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-11 06:27:26,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=960490.0, ans=0.0 2024-08-11 06:27:30,772 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 06:27:32,542 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9100, loss[loss=0.112, beats_loss=0.01193, ecapa_loss=0.0002177, whisper_loss=0.09788, over 21106.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01149, ecapa_loss=0.0002068, whisper_loss=0.09345, over 3865631.70 frames. ], batch size: 82, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:27:42,390 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 06:27:44,335 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-11 06:28:42,175 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-11 06:28:52,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.825e+01 3.107e+01 3.810e+01 5.498e+01, threshold=6.214e+01, percent-clipped=0.0 2024-08-11 06:28:54,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=960990.0, ans=0.125 2024-08-11 06:29:00,299 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 06:29:10,694 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9150, loss[loss=0.09881, beats_loss=0.01455, ecapa_loss=0.0001908, whisper_loss=0.08236, over 15774.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01143, ecapa_loss=0.0002065, whisper_loss=0.09424, over 3915307.68 frames. ], batch size: 63, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:29:17,463 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.20 vs. limit=22.5 2024-08-11 06:29:23,534 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2024-08-11 06:29:24,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=961090.0, ans=0.125 2024-08-11 06:29:33,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=961190.0, ans=0.125 2024-08-11 06:30:12,429 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2024-08-11 06:30:31,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=961490.0, ans=0.2 2024-08-11 06:30:43,986 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9200, loss[loss=0.1091, beats_loss=0.01231, ecapa_loss=0.0002061, whisper_loss=0.09469, over 18421.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01143, ecapa_loss=0.0002063, whisper_loss=0.0939, over 3894734.92 frames. ], batch size: 71, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:30:44,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=961590.0, ans=0.125 2024-08-11 06:30:51,804 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 06:31:01,474 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 06:31:10,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=961690.0, ans=0.125 2024-08-11 06:31:24,870 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-08-11 06:31:54,129 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 06:31:54,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.60 vs. limit=15.0 2024-08-11 06:32:06,587 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.686e+01 3.168e+01 3.590e+01 6.490e+01, threshold=6.336e+01, percent-clipped=1.0 2024-08-11 06:32:19,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=961990.0, ans=0.125 2024-08-11 06:32:26,092 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9250, loss[loss=0.1024, beats_loss=0.01317, ecapa_loss=0.0001904, whisper_loss=0.0873, over 22336.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01145, ecapa_loss=0.0002063, whisper_loss=0.09409, over 3915094.69 frames. ], batch size: 91, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:32:26,941 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 06:32:31,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=962090.0, ans=0.125 2024-08-11 06:32:32,678 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 06:32:37,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=962090.0, ans=0.125 2024-08-11 06:32:37,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=962090.0, ans=0.125 2024-08-11 06:33:10,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=962290.0, ans=0.0 2024-08-11 06:33:12,127 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.05 vs. limit=6.0 2024-08-11 06:33:13,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=962290.0, ans=0.0 2024-08-11 06:33:49,759 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9300, loss[loss=0.1015, beats_loss=0.01233, ecapa_loss=0.0001919, whisper_loss=0.0873, over 18322.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0114, ecapa_loss=0.0002056, whisper_loss=0.09414, over 3911734.50 frames. ], batch size: 74, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:34:00,483 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=12.0 2024-08-11 06:34:18,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=962790.0, ans=0.1 2024-08-11 06:34:27,083 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 06:34:50,106 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.791e+01 3.053e+01 3.524e+01 6.115e+01, threshold=6.107e+01, percent-clipped=0.0 2024-08-11 06:35:03,204 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9350, loss[loss=0.107, beats_loss=0.01305, ecapa_loss=0.0001692, whisper_loss=0.09223, over 19033.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01141, ecapa_loss=0.0002059, whisper_loss=0.09381, over 3854172.52 frames. ], batch size: 72, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:35:13,242 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.86 vs. limit=10.0 2024-08-11 06:35:21,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=963190.0, ans=0.125 2024-08-11 06:35:24,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=963190.0, ans=0.2 2024-08-11 06:35:28,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=963190.0, ans=0.0 2024-08-11 06:35:30,985 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 06:35:41,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=963290.0, ans=0.1 2024-08-11 06:35:42,224 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-11 06:35:43,906 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=12.0 2024-08-11 06:35:58,179 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-11 06:35:59,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=963390.0, ans=0.125 2024-08-11 06:36:03,479 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 26 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-11 06:36:15,481 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 06:36:17,977 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9400, loss[loss=0.1066, beats_loss=0.01163, ecapa_loss=0.0001959, whisper_loss=0.09305, over 19635.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01138, ecapa_loss=0.000207, whisper_loss=0.09387, over 3861212.29 frames. ], batch size: 79, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:36:39,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=963690.0, ans=0.125 2024-08-11 06:36:41,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=963690.0, ans=0.125 2024-08-11 06:36:53,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=963790.0, ans=0.0 2024-08-11 06:37:15,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=963890.0, ans=0.2 2024-08-11 06:37:18,853 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.687e+01 3.013e+01 3.513e+01 7.296e+01, threshold=6.026e+01, percent-clipped=1.0 2024-08-11 06:37:21,499 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.81 vs. limit=5.0 2024-08-11 06:37:29,443 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.13 vs. limit=10.0 2024-08-11 06:37:32,774 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9450, loss[loss=0.1016, beats_loss=0.01071, ecapa_loss=0.0002152, whisper_loss=0.08872, over 18739.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01141, ecapa_loss=0.0002075, whisper_loss=0.09369, over 3882455.49 frames. ], batch size: 77, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:37:38,617 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 06:37:47,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=964190.0, ans=0.125 2024-08-11 06:37:52,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=964190.0, ans=0.125 2024-08-11 06:37:55,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=964190.0, ans=0.0 2024-08-11 06:37:59,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=964190.0, ans=0.125 2024-08-11 06:38:09,450 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-11 06:38:42,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=964490.0, ans=0.1 2024-08-11 06:38:45,606 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-11 06:38:48,404 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.37 vs. limit=22.5 2024-08-11 06:38:48,748 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9500, loss[loss=0.09814, beats_loss=0.01203, ecapa_loss=0.0002148, whisper_loss=0.08396, over 14979.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01144, ecapa_loss=0.0002081, whisper_loss=0.09292, over 3896362.89 frames. ], batch size: 62, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:38:50,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=964590.0, ans=0.0 2024-08-11 06:39:06,607 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 06:39:19,010 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 06:39:33,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=964890.0, ans=0.0 2024-08-11 06:39:50,482 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.745e+01 3.159e+01 3.801e+01 1.108e+02, threshold=6.317e+01, percent-clipped=3.0 2024-08-11 06:39:52,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=964990.0, ans=0.0 2024-08-11 06:39:55,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=964990.0, ans=0.1 2024-08-11 06:39:59,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=964990.0, ans=0.2 2024-08-11 06:40:03,729 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9550, loss[loss=0.09783, beats_loss=0.008319, ecapa_loss=0.0002226, whisper_loss=0.08728, over 14978.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01143, ecapa_loss=0.0002076, whisper_loss=0.09333, over 3890682.84 frames. ], batch size: 57, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:40:10,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=965090.0, ans=0.0 2024-08-11 06:40:10,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=965090.0, ans=0.2 2024-08-11 06:40:12,762 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 06:40:18,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=965190.0, ans=0.1 2024-08-11 06:40:27,685 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 06:40:31,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=965290.0, ans=0.0 2024-08-11 06:40:35,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=965290.0, ans=0.0 2024-08-11 06:40:36,397 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 16 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 06:40:38,047 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 06:40:48,472 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.31 vs. limit=10.0 2024-08-11 06:40:53,747 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-11 06:41:06,616 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 06:41:06,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=965490.0, ans=0.125 2024-08-11 06:41:13,983 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9600, loss[loss=0.1181, beats_loss=0.011, ecapa_loss=0.0001725, whisper_loss=0.1053, over 24304.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01137, ecapa_loss=0.0002076, whisper_loss=0.09354, over 3890902.60 frames. ], batch size: 94, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:41:16,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=965590.0, ans=0.125 2024-08-11 06:41:19,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-11 06:41:26,141 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=12.0 2024-08-11 06:41:27,761 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-08-11 06:41:33,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=965690.0, ans=0.04949747468305833 2024-08-11 06:41:34,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=965690.0, ans=0.1 2024-08-11 06:41:34,811 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=12.0 2024-08-11 06:41:45,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=965790.0, ans=0.2 2024-08-11 06:41:47,931 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 06:41:48,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=965790.0, ans=0.125 2024-08-11 06:42:14,237 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.765e+01 3.049e+01 3.383e+01 4.788e+01, threshold=6.099e+01, percent-clipped=0.0 2024-08-11 06:42:28,388 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9650, loss[loss=0.1187, beats_loss=0.01254, ecapa_loss=0.0002001, whisper_loss=0.1042, over 22932.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01134, ecapa_loss=0.0002081, whisper_loss=0.09349, over 3861186.54 frames. ], batch size: 94, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:42:50,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=966190.0, ans=0.0 2024-08-11 06:42:53,983 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 06:42:56,876 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 06:42:57,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=966290.0, ans=0.1 2024-08-11 06:43:00,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=966290.0, ans=10.0 2024-08-11 06:43:03,596 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 06:43:29,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=966490.0, ans=0.125 2024-08-11 06:43:34,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=966490.0, ans=0.125 2024-08-11 06:43:43,386 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9700, loss[loss=0.1132, beats_loss=0.009053, ecapa_loss=0.0002495, whisper_loss=0.1017, over 14594.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01134, ecapa_loss=0.0002099, whisper_loss=0.09304, over 3846832.41 frames. ], batch size: 60, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:44:09,806 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.62 vs. limit=22.5 2024-08-11 06:44:19,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=966790.0, ans=0.125 2024-08-11 06:44:19,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=966790.0, ans=0.04949747468305833 2024-08-11 06:44:21,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=966790.0, ans=0.125 2024-08-11 06:44:42,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=966990.0, ans=10.0 2024-08-11 06:44:42,525 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.608e+01 2.892e+01 3.245e+01 5.119e+01, threshold=5.784e+01, percent-clipped=0.0 2024-08-11 06:44:55,496 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9750, loss[loss=0.09627, beats_loss=0.009759, ecapa_loss=0.0002235, whisper_loss=0.08427, over 18280.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01132, ecapa_loss=0.0002098, whisper_loss=0.09301, over 3806976.64 frames. ], batch size: 75, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:44:58,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=967090.0, ans=0.0 2024-08-11 06:45:32,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=967290.0, ans=0.2 2024-08-11 06:45:46,228 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 06:46:07,818 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9800, loss[loss=0.1105, beats_loss=0.01117, ecapa_loss=0.0002503, whisper_loss=0.09677, over 22584.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01127, ecapa_loss=0.0002092, whisper_loss=0.09377, over 3824400.39 frames. ], batch size: 93, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:46:10,634 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 06:46:16,672 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 06:46:25,143 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 06:46:33,411 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-11 06:46:36,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=967790.0, ans=0.125 2024-08-11 06:46:50,613 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 06:46:51,930 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 06:46:54,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=967890.0, ans=0.0 2024-08-11 06:47:05,460 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 23 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-11 06:47:06,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.643e+01 2.929e+01 3.455e+01 6.415e+01, threshold=5.858e+01, percent-clipped=3.0 2024-08-11 06:47:19,873 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9850, loss[loss=0.1113, beats_loss=0.01177, ecapa_loss=0.000147, whisper_loss=0.09803, over 24001.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01126, ecapa_loss=0.000208, whisper_loss=0.09443, over 3847983.36 frames. ], batch size: 91, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:47:44,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=968190.0, ans=0.1 2024-08-11 06:47:45,844 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 31 from Vox, 24 fro AS 2024-08-11 06:48:03,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=968390.0, ans=0.0 2024-08-11 06:48:11,707 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 06:48:16,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=968390.0, ans=0.0 2024-08-11 06:48:30,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=968490.0, ans=0.0 2024-08-11 06:48:34,804 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9900, loss[loss=0.1204, beats_loss=0.01225, ecapa_loss=0.0001883, whisper_loss=0.1062, over 23176.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01134, ecapa_loss=0.000208, whisper_loss=0.09393, over 3881135.17 frames. ], batch size: 91, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:48:41,711 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.44 vs. limit=22.5 2024-08-11 06:48:46,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=968590.0, ans=0.125 2024-08-11 06:48:49,078 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 06:48:49,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=968690.0, ans=0.125 2024-08-11 06:48:52,406 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 23 from LS+wenet, 20 from Vox, 13 fro AS 2024-08-11 06:49:17,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=968890.0, ans=0.1 2024-08-11 06:49:19,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=968890.0, ans=0.2 2024-08-11 06:49:32,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.797e+01 3.066e+01 3.610e+01 6.025e+01, threshold=6.133e+01, percent-clipped=2.0 2024-08-11 06:49:33,742 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 06:49:44,681 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-08-11 06:49:45,249 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 9950, loss[loss=0.1219, beats_loss=0.009506, ecapa_loss=0.0002568, whisper_loss=0.1098, over 18650.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01137, ecapa_loss=0.0002093, whisper_loss=0.09351, over 3856461.50 frames. ], batch size: 75, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:50:08,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=969190.0, ans=0.015 2024-08-11 06:50:23,602 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 06:50:33,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=969390.0, ans=0.0 2024-08-11 06:50:36,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=969390.0, ans=0.125 2024-08-11 06:50:37,643 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 24 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-11 06:50:43,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-08-11 06:50:57,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=969590.0, ans=0.125 2024-08-11 06:50:58,313 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10000, loss[loss=0.1016, beats_loss=0.01301, ecapa_loss=0.00018, whisper_loss=0.08678, over 18544.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01128, ecapa_loss=0.0002091, whisper_loss=0.09398, over 3853435.76 frames. ], batch size: 75, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:51:03,249 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.93 vs. limit=22.5 2024-08-11 06:51:03,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=969590.0, ans=22.5 2024-08-11 06:51:14,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=969690.0, ans=0.125 2024-08-11 06:51:16,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.21 vs. limit=15.0 2024-08-11 06:51:16,581 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-11 06:51:34,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=969790.0, ans=0.2 2024-08-11 06:51:56,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.627e+01 2.974e+01 3.477e+01 5.733e+01, threshold=5.949e+01, percent-clipped=0.0 2024-08-11 06:51:56,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=969990.0, ans=0.125 2024-08-11 06:52:09,043 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10050, loss[loss=0.1149, beats_loss=0.01159, ecapa_loss=0.0002034, whisper_loss=0.1013, over 22281.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01137, ecapa_loss=0.0002072, whisper_loss=0.09346, over 3858831.01 frames. ], batch size: 91, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:52:15,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=970090.0, ans=0.2 2024-08-11 06:52:25,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=970190.0, ans=0.125 2024-08-11 06:52:28,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=970190.0, ans=0.125 2024-08-11 06:52:43,732 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 06:52:48,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=970290.0, ans=0.125 2024-08-11 06:52:51,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=970390.0, ans=0.0 2024-08-11 06:52:53,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=970390.0, ans=0.125 2024-08-11 06:52:54,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=970390.0, ans=0.95 2024-08-11 06:53:03,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=970490.0, ans=0.2 2024-08-11 06:53:14,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=970490.0, ans=0.125 2024-08-11 06:53:18,083 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10100, loss[loss=0.09432, beats_loss=0.0127, ecapa_loss=0.0001909, whisper_loss=0.07971, over 22697.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01144, ecapa_loss=0.0002058, whisper_loss=0.09352, over 3882611.61 frames. ], batch size: 90, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:53:21,978 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 06:53:22,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=970590.0, ans=0.125 2024-08-11 06:53:32,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=12.0 2024-08-11 06:53:40,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=970690.0, ans=0.125 2024-08-11 06:53:49,430 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 06:53:50,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2024-08-11 06:53:50,702 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 06:53:57,104 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-11 06:54:06,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=970890.0, ans=0.0 2024-08-11 06:54:07,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=970890.0, ans=0.0 2024-08-11 06:54:11,574 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.833e+01 3.189e+01 3.704e+01 6.701e+01, threshold=6.379e+01, percent-clipped=2.0 2024-08-11 06:54:12,246 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2024-08-11 06:54:23,089 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10150, loss[loss=0.1001, beats_loss=0.0137, ecapa_loss=0.0001872, whisper_loss=0.08451, over 22027.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01144, ecapa_loss=0.0002071, whisper_loss=0.09327, over 3909185.56 frames. ], batch size: 92, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:54:37,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=971190.0, ans=0.125 2024-08-11 06:55:04,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=971390.0, ans=0.0 2024-08-11 06:55:16,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=971490.0, ans=0.125 2024-08-11 06:55:22,075 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 06:55:24,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=971490.0, ans=0.2 2024-08-11 06:55:25,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=971490.0, ans=0.1 2024-08-11 06:55:28,601 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10200, loss[loss=0.1116, beats_loss=0.01258, ecapa_loss=0.0001618, whisper_loss=0.09745, over 16666.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01144, ecapa_loss=0.0002081, whisper_loss=0.09275, over 3877424.12 frames. ], batch size: 63, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:55:31,522 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 06:55:35,003 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=12.0 2024-08-11 06:55:35,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=971590.0, ans=0.0 2024-08-11 06:55:39,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=971590.0, ans=0.0 2024-08-11 06:55:41,256 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.33 vs. limit=10.0 2024-08-11 06:55:53,224 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=12.0 2024-08-11 06:56:01,195 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 06:56:16,283 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2024-08-11 06:56:16,925 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 06:56:19,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=971990.0, ans=0.125 2024-08-11 06:56:22,077 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.593e+01 3.063e+01 3.580e+01 1.842e+02, threshold=6.125e+01, percent-clipped=1.0 2024-08-11 06:56:28,453 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 06:56:33,725 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10250, loss[loss=0.1103, beats_loss=0.01119, ecapa_loss=0.0002354, whisper_loss=0.09678, over 19763.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01137, ecapa_loss=0.0002075, whisper_loss=0.0937, over 3893768.50 frames. ], batch size: 82, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:56:38,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=972090.0, ans=0.125 2024-08-11 06:56:38,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=972090.0, ans=0.125 2024-08-11 06:56:51,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2024-08-11 06:56:54,960 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 06:57:01,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=972290.0, ans=0.09899494936611666 2024-08-11 06:57:06,508 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 34 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 06:57:09,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=972290.0, ans=0.1 2024-08-11 06:57:14,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=972390.0, ans=0.125 2024-08-11 06:57:15,964 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=22.5 2024-08-11 06:57:24,981 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.617e+05 2024-08-11 06:57:31,106 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 06:57:38,640 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10300, loss[loss=0.1292, beats_loss=0.009017, ecapa_loss=0.0002444, whisper_loss=0.1177, over 14048.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01129, ecapa_loss=0.0002072, whisper_loss=0.09329, over 3855887.12 frames. ], batch size: 55, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:57:43,999 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 06:57:57,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=972690.0, ans=22.5 2024-08-11 06:58:00,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=972690.0, ans=0.125 2024-08-11 06:58:15,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=972890.0, ans=0.025 2024-08-11 06:58:17,000 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 06:58:18,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=972890.0, ans=0.2 2024-08-11 06:58:19,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=972890.0, ans=0.0 2024-08-11 06:58:21,351 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 06:58:25,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=972890.0, ans=0.0 2024-08-11 06:58:31,178 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.762e+01 3.121e+01 3.725e+01 5.735e+01, threshold=6.242e+01, percent-clipped=0.0 2024-08-11 06:58:32,637 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 06:58:35,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=972990.0, ans=0.05 2024-08-11 06:58:36,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=972990.0, ans=0.0 2024-08-11 06:58:40,154 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.27 vs. limit=15.0 2024-08-11 06:58:42,815 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10350, loss[loss=0.102, beats_loss=0.01116, ecapa_loss=0.000166, whisper_loss=0.08919, over 23255.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01132, ecapa_loss=0.0002059, whisper_loss=0.09331, over 3857677.81 frames. ], batch size: 91, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 06:58:42,953 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 06:58:52,600 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.50 vs. limit=22.5 2024-08-11 06:58:55,825 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 14 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 06:58:58,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=973190.0, ans=0.125 2024-08-11 06:59:01,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=973190.0, ans=0.125 2024-08-11 06:59:03,973 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-11 06:59:18,047 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 06:59:19,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=973290.0, ans=0.1 2024-08-11 06:59:32,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=973390.0, ans=0.125 2024-08-11 06:59:33,830 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 06:59:48,075 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10400, loss[loss=0.1056, beats_loss=0.01112, ecapa_loss=0.0002103, whisper_loss=0.09238, over 22358.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01141, ecapa_loss=0.0002046, whisper_loss=0.0934, over 3872950.20 frames. ], batch size: 91, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:00:08,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=973690.0, ans=0.125 2024-08-11 07:00:09,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=973690.0, ans=0.0 2024-08-11 07:00:15,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=973790.0, ans=0.125 2024-08-11 07:00:32,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=973890.0, ans=10.0 2024-08-11 07:00:34,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=973890.0, ans=0.1 2024-08-11 07:00:35,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=973890.0, ans=0.0 2024-08-11 07:00:37,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=973890.0, ans=0.125 2024-08-11 07:00:41,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=973990.0, ans=0.0 2024-08-11 07:00:42,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.078e+01 2.630e+01 2.925e+01 3.255e+01 4.896e+01, threshold=5.851e+01, percent-clipped=0.0 2024-08-11 07:00:53,661 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10450, loss[loss=0.08201, beats_loss=0.01466, ecapa_loss=0.0001856, whisper_loss=0.06549, over 22226.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01145, ecapa_loss=0.0002042, whisper_loss=0.09306, over 3887615.99 frames. ], batch size: 93, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:01:09,684 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 07:01:11,018 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 07:01:13,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=974190.0, ans=0.125 2024-08-11 07:01:49,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=974490.0, ans=0.2 2024-08-11 07:01:53,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=974490.0, ans=0.0 2024-08-11 07:01:54,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=974490.0, ans=0.0 2024-08-11 07:02:02,218 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10500, loss[loss=0.1007, beats_loss=0.01444, ecapa_loss=0.0002143, whisper_loss=0.08414, over 15837.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01135, ecapa_loss=0.0002071, whisper_loss=0.0936, over 3862052.56 frames. ], batch size: 68, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:02:22,023 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.15 vs. limit=15.0 2024-08-11 07:02:27,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=974690.0, ans=0.125 2024-08-11 07:02:42,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=974890.0, ans=0.125 2024-08-11 07:02:46,703 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-08-11 07:02:57,720 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.661e+01 2.970e+01 3.368e+01 5.123e+01, threshold=5.939e+01, percent-clipped=0.0 2024-08-11 07:03:10,151 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10550, loss[loss=0.1251, beats_loss=0.01008, ecapa_loss=0.0002389, whisper_loss=0.1127, over 22003.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01136, ecapa_loss=0.0002056, whisper_loss=0.09347, over 3873666.86 frames. ], batch size: 90, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:03:14,060 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 07:03:20,670 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 07:03:28,657 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-11 07:03:44,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=975290.0, ans=0.125 2024-08-11 07:03:46,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=975290.0, ans=0.125 2024-08-11 07:03:56,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=975390.0, ans=0.125 2024-08-11 07:04:01,454 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 07:04:01,996 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.687e-02 2024-08-11 07:04:18,159 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10600, loss[loss=0.09629, beats_loss=0.01016, ecapa_loss=0.0001974, whisper_loss=0.08416, over 16791.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01145, ecapa_loss=0.0002049, whisper_loss=0.0927, over 3876200.68 frames. ], batch size: 65, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:04:20,900 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 27 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-11 07:04:30,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=975690.0, ans=0.0 2024-08-11 07:04:43,744 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 19 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 07:04:45,611 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2024-08-11 07:05:03,997 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 27 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 07:05:05,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=975890.0, ans=0.125 2024-08-11 07:05:08,037 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 07:05:09,403 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 07:05:11,776 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 2.786e+01 3.038e+01 3.518e+01 8.413e+01, threshold=6.076e+01, percent-clipped=1.0 2024-08-11 07:05:23,773 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10650, loss[loss=0.1231, beats_loss=0.00906, ecapa_loss=0.000221, whisper_loss=0.1118, over 17675.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01138, ecapa_loss=0.0002056, whisper_loss=0.09347, over 3867056.03 frames. ], batch size: 67, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:05:27,903 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 07:05:28,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=976090.0, ans=0.0 2024-08-11 07:05:32,179 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 19 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 07:05:34,709 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 07:05:38,937 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 07:05:52,660 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2024-08-11 07:05:55,682 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 07:06:04,755 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 07:06:15,543 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-11 07:06:17,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=976490.0, ans=0.125 2024-08-11 07:06:30,006 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10700, loss[loss=0.1182, beats_loss=0.01071, ecapa_loss=0.0001922, whisper_loss=0.1056, over 21218.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0114, ecapa_loss=0.0002042, whisper_loss=0.0936, over 3879158.08 frames. ], batch size: 82, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:06:34,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=976590.0, ans=0.125 2024-08-11 07:06:59,518 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 07:07:16,805 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 07:07:18,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=976890.0, ans=0.025 2024-08-11 07:07:21,891 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 07:07:24,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.712e+01 3.090e+01 3.800e+01 9.134e+01, threshold=6.180e+01, percent-clipped=2.0 2024-08-11 07:07:32,547 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 07:07:36,217 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10750, loss[loss=0.1044, beats_loss=0.01094, ecapa_loss=0.0001751, whisper_loss=0.09166, over 15773.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01141, ecapa_loss=0.0002053, whisper_loss=0.09423, over 3882510.13 frames. ], batch size: 59, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:08:16,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=977390.0, ans=0.125 2024-08-11 07:08:34,149 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 07:08:42,309 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 07:08:43,342 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10800, loss[loss=0.1197, beats_loss=0.01118, ecapa_loss=0.0001992, whisper_loss=0.1066, over 20767.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01144, ecapa_loss=0.0002062, whisper_loss=0.09471, over 3902687.83 frames. ], batch size: 83, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:08:43,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=977590.0, ans=0.07 2024-08-11 07:08:54,674 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 07:09:06,860 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 25 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-11 07:09:21,794 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 07:09:24,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=977890.0, ans=0.125 2024-08-11 07:09:37,978 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 07:09:39,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.607e+01 2.912e+01 3.510e+01 6.638e+01, threshold=5.825e+01, percent-clipped=1.0 2024-08-11 07:09:48,018 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-11 07:09:51,647 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10850, loss[loss=0.1441, beats_loss=0.007429, ecapa_loss=0.0001928, whisper_loss=0.1347, over 22460.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01145, ecapa_loss=0.000206, whisper_loss=0.09442, over 3895702.23 frames. ], batch size: 84, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:10:00,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=978090.0, ans=0.0 2024-08-11 07:10:18,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=978290.0, ans=0.0 2024-08-11 07:10:25,258 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.30 vs. limit=6.0 2024-08-11 07:10:39,198 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-11 07:10:59,784 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10900, loss[loss=0.1163, beats_loss=0.009281, ecapa_loss=0.0002553, whisper_loss=0.1045, over 18915.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01147, ecapa_loss=0.0002043, whisper_loss=0.09423, over 3926384.29 frames. ], batch size: 78, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:11:02,852 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 07:11:12,235 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-11 07:11:28,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=978790.0, ans=0.125 2024-08-11 07:11:29,766 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 20 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-11 07:11:31,067 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 07:11:32,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=978790.0, ans=0.125 2024-08-11 07:11:33,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=978790.0, ans=0.2 2024-08-11 07:11:45,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=978890.0, ans=0.1 2024-08-11 07:11:52,623 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 07:11:55,397 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.834e+01 3.154e+01 3.675e+01 5.808e+01, threshold=6.308e+01, percent-clipped=0.0 2024-08-11 07:11:55,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=978990.0, ans=0.1 2024-08-11 07:11:56,265 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-11 07:12:03,760 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 07:12:05,031 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 07:12:07,511 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 10950, loss[loss=0.1143, beats_loss=0.01109, ecapa_loss=0.0002313, whisper_loss=0.1009, over 21613.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01144, ecapa_loss=0.0002046, whisper_loss=0.09426, over 3941834.49 frames. ], batch size: 90, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:12:24,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=979190.0, ans=0.1 2024-08-11 07:12:25,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=979190.0, ans=0.125 2024-08-11 07:12:30,376 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 20 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 07:12:41,004 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 07:12:55,390 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 07:13:05,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=979490.0, ans=0.0 2024-08-11 07:13:13,870 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11000, loss[loss=0.1042, beats_loss=0.01132, ecapa_loss=0.0002126, whisper_loss=0.09076, over 22759.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01141, ecapa_loss=0.0002047, whisper_loss=0.09427, over 3956525.03 frames. ], batch size: 93, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:13:26,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=979690.0, ans=0.125 2024-08-11 07:13:32,687 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 07:13:34,152 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 07:13:37,891 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2024-08-11 07:13:38,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=979690.0, ans=0.125 2024-08-11 07:13:44,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=979790.0, ans=0.1 2024-08-11 07:13:46,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=979790.0, ans=0.04949747468305833 2024-08-11 07:13:48,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=979790.0, ans=0.2 2024-08-11 07:13:51,689 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-11 07:14:08,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=979990.0, ans=0.125 2024-08-11 07:14:08,871 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.630e+01 2.984e+01 3.392e+01 5.712e+01, threshold=5.968e+01, percent-clipped=0.0 2024-08-11 07:14:20,894 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11050, loss[loss=0.1237, beats_loss=0.009347, ecapa_loss=0.0002199, whisper_loss=0.1122, over 18683.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01135, ecapa_loss=0.0002062, whisper_loss=0.09383, over 3917312.52 frames. ], batch size: 73, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:14:31,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=980090.0, ans=0.0 2024-08-11 07:14:37,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=980190.0, ans=0.1 2024-08-11 07:14:57,156 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 07:15:08,178 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 07:15:14,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.95 vs. limit=15.0 2024-08-11 07:15:25,458 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 07:15:28,048 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11100, loss[loss=0.1205, beats_loss=0.0123, ecapa_loss=0.0001762, whisper_loss=0.1064, over 15988.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01136, ecapa_loss=0.0002062, whisper_loss=0.09418, over 3896710.18 frames. ], batch size: 63, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:15:36,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=980590.0, ans=0.1 2024-08-11 07:15:41,679 INFO [train_multi_KD3.py:844] (1/4) A total of 97 cuts. 23 from LS+wenet, 21 from Vox, 53 fro AS 2024-08-11 07:15:49,807 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 07:15:52,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=980690.0, ans=0.2 2024-08-11 07:15:56,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=980790.0, ans=0.2 2024-08-11 07:16:02,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=980790.0, ans=0.125 2024-08-11 07:16:06,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=980790.0, ans=0.125 2024-08-11 07:16:17,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=980890.0, ans=0.125 2024-08-11 07:16:21,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=980990.0, ans=0.0 2024-08-11 07:16:22,626 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 32 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 07:16:22,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=980990.0, ans=0.0 2024-08-11 07:16:23,720 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.195e+01 2.722e+01 3.049e+01 3.591e+01 6.029e+01, threshold=6.098e+01, percent-clipped=1.0 2024-08-11 07:16:36,257 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11150, loss[loss=0.09272, beats_loss=0.01246, ecapa_loss=0.0002085, whisper_loss=0.07818, over 21775.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01139, ecapa_loss=0.0002054, whisper_loss=0.09397, over 3919199.80 frames. ], batch size: 91, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:16:38,478 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2024-08-11 07:16:46,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=981090.0, ans=0.125 2024-08-11 07:16:53,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=981190.0, ans=0.1 2024-08-11 07:16:53,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=981190.0, ans=0.125 2024-08-11 07:17:02,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=981290.0, ans=0.125 2024-08-11 07:17:02,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=981290.0, ans=10.0 2024-08-11 07:17:18,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=981390.0, ans=0.1 2024-08-11 07:17:43,850 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11200, loss[loss=0.1272, beats_loss=0.01015, ecapa_loss=0.0002398, whisper_loss=0.1147, over 22715.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01139, ecapa_loss=0.0002042, whisper_loss=0.09443, over 3941756.66 frames. ], batch size: 91, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:17:56,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=981690.0, ans=10.0 2024-08-11 07:18:08,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=981690.0, ans=0.2 2024-08-11 07:18:10,711 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 07:18:14,739 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 07:18:30,504 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-11 07:18:37,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=981990.0, ans=0.07 2024-08-11 07:18:39,130 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 2.676e+01 2.993e+01 3.397e+01 5.977e+01, threshold=5.986e+01, percent-clipped=0.0 2024-08-11 07:18:48,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=981990.0, ans=0.1 2024-08-11 07:18:50,499 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 07:18:50,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=982090.0, ans=0.1 2024-08-11 07:18:51,714 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11250, loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0002241, whisper_loss=0.09035, over 20784.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.0113, ecapa_loss=0.000205, whisper_loss=0.0952, over 3936217.22 frames. ], batch size: 83, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:18:53,862 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.04 vs. limit=10.0 2024-08-11 07:19:00,355 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2024-08-11 07:19:11,468 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.39 vs. limit=22.5 2024-08-11 07:19:20,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=982290.0, ans=0.0 2024-08-11 07:19:24,001 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 07:19:37,503 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-11 07:19:37,562 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-08-11 07:19:49,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=982490.0, ans=0.0 2024-08-11 07:19:49,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=982490.0, ans=0.025 2024-08-11 07:19:54,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=982490.0, ans=0.0 2024-08-11 07:19:59,729 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11300, loss[loss=0.1076, beats_loss=0.0108, ecapa_loss=0.0002093, whisper_loss=0.09466, over 16645.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01127, ecapa_loss=0.0002033, whisper_loss=0.09514, over 3934056.91 frames. ], batch size: 67, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:20:03,241 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2024-08-11 07:20:12,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=982690.0, ans=0.2 2024-08-11 07:20:15,010 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 07:20:16,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=982690.0, ans=0.125 2024-08-11 07:20:37,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=982790.0, ans=0.5 2024-08-11 07:20:40,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=982890.0, ans=0.0 2024-08-11 07:20:46,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=982890.0, ans=0.125 2024-08-11 07:20:53,017 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.820e-02 2024-08-11 07:20:53,840 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.719e+01 3.008e+01 3.388e+01 1.679e+02, threshold=6.016e+01, percent-clipped=1.0 2024-08-11 07:20:56,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=982990.0, ans=0.0 2024-08-11 07:20:57,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=982990.0, ans=0.125 2024-08-11 07:21:04,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=983090.0, ans=0.1 2024-08-11 07:21:05,487 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11350, loss[loss=0.0965, beats_loss=0.01149, ecapa_loss=0.0002387, whisper_loss=0.08262, over 22031.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01118, ecapa_loss=0.000204, whisper_loss=0.09579, over 3945599.49 frames. ], batch size: 91, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:21:17,293 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 07:21:23,420 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 07:21:35,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=983290.0, ans=0.125 2024-08-11 07:21:57,464 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 07:21:59,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=983490.0, ans=0.025 2024-08-11 07:21:59,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=983490.0, ans=0.0 2024-08-11 07:22:01,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=983490.0, ans=0.125 2024-08-11 07:22:05,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=983490.0, ans=0.125 2024-08-11 07:22:10,229 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11400, loss[loss=0.1266, beats_loss=0.01202, ecapa_loss=0.0001635, whisper_loss=0.1129, over 23301.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01122, ecapa_loss=0.000206, whisper_loss=0.09501, over 3890740.90 frames. ], batch size: 89, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:22:11,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.20 vs. limit=22.5 2024-08-11 07:22:19,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=983590.0, ans=0.0 2024-08-11 07:22:28,468 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 07:22:29,762 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-11 07:22:47,359 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 07:22:56,756 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2024-08-11 07:22:57,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=983890.0, ans=0.07 2024-08-11 07:22:58,600 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 07:23:02,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.896e+01 3.252e+01 3.905e+01 6.465e+01, threshold=6.504e+01, percent-clipped=1.0 2024-08-11 07:23:13,919 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11450, loss[loss=0.1175, beats_loss=0.008892, ecapa_loss=0.0001771, whisper_loss=0.1068, over 18882.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01131, ecapa_loss=0.0002047, whisper_loss=0.09432, over 3894345.27 frames. ], batch size: 69, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:23:21,780 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 07:23:28,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.97 vs. limit=22.5 2024-08-11 07:24:08,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=984490.0, ans=0.0 2024-08-11 07:24:12,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=984490.0, ans=0.125 2024-08-11 07:24:12,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=984490.0, ans=0.125 2024-08-11 07:24:13,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=984490.0, ans=0.2 2024-08-11 07:24:23,626 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11500, loss[loss=0.06904, beats_loss=0.01559, ecapa_loss=0.0001294, whisper_loss=0.05215, over 14444.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01138, ecapa_loss=0.0002033, whisper_loss=0.09416, over 3868228.81 frames. ], batch size: 54, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:24:42,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=984690.0, ans=0.125 2024-08-11 07:24:48,618 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 07:25:18,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=984790.0, ans=0.025 2024-08-11 07:25:18,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=984790.0, ans=0.1 2024-08-11 07:25:26,229 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 22 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-11 07:25:44,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=984990.0, ans=0.0 2024-08-11 07:25:44,614 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-11 07:25:45,094 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.765e+01 3.010e+01 3.592e+01 5.034e+01, threshold=6.021e+01, percent-clipped=0.0 2024-08-11 07:26:03,223 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11550, loss[loss=0.08651, beats_loss=0.01163, ecapa_loss=0.0001949, whisper_loss=0.07293, over 18673.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.0113, ecapa_loss=0.000204, whisper_loss=0.09401, over 3855527.11 frames. ], batch size: 72, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:26:31,750 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.36 vs. limit=22.5 2024-08-11 07:27:10,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=985390.0, ans=0.125 2024-08-11 07:27:11,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=985390.0, ans=0.02 2024-08-11 07:27:17,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.28 vs. limit=15.0 2024-08-11 07:27:23,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=985390.0, ans=0.1 2024-08-11 07:27:37,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=985490.0, ans=0.1 2024-08-11 07:27:52,400 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11600, loss[loss=0.112, beats_loss=0.008842, ecapa_loss=0.0002329, whisper_loss=0.1008, over 18414.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0113, ecapa_loss=0.0002051, whisper_loss=0.09421, over 3861416.63 frames. ], batch size: 74, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:28:19,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=985690.0, ans=0.2 2024-08-11 07:29:06,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=985790.0, ans=0.125 2024-08-11 07:29:14,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=985890.0, ans=0.125 2024-08-11 07:29:29,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.587e+01 2.898e+01 3.413e+01 5.144e+01, threshold=5.796e+01, percent-clipped=0.0 2024-08-11 07:29:36,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=985990.0, ans=0.125 2024-08-11 07:29:37,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2024-08-11 07:29:44,363 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11650, loss[loss=0.1391, beats_loss=0.01058, ecapa_loss=0.0002105, whisper_loss=0.1265, over 22523.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01127, ecapa_loss=0.000206, whisper_loss=0.09476, over 3888803.55 frames. ], batch size: 88, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:29:54,062 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.73 vs. limit=15.0 2024-08-11 07:30:05,355 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-11 07:30:12,569 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.11 vs. limit=10.0 2024-08-11 07:30:21,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=986290.0, ans=0.125 2024-08-11 07:31:01,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=986490.0, ans=0.125 2024-08-11 07:31:05,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=986490.0, ans=0.125 2024-08-11 07:31:13,054 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11700, loss[loss=0.1056, beats_loss=0.008596, ecapa_loss=0.0002504, whisper_loss=0.09445, over 16023.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0114, ecapa_loss=0.0002047, whisper_loss=0.09419, over 3887409.16 frames. ], batch size: 65, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:31:28,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=986590.0, ans=0.1 2024-08-11 07:31:30,029 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 07:31:45,489 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 07:31:52,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=986790.0, ans=0.125 2024-08-11 07:32:00,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=986790.0, ans=0.0 2024-08-11 07:32:24,503 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.842e+01 3.149e+01 3.845e+01 7.778e+01, threshold=6.297e+01, percent-clipped=3.0 2024-08-11 07:32:24,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=986990.0, ans=0.125 2024-08-11 07:32:39,076 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11750, loss[loss=0.0935, beats_loss=0.01198, ecapa_loss=0.0002076, whisper_loss=0.07944, over 16081.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01146, ecapa_loss=0.0002046, whisper_loss=0.09436, over 3874732.44 frames. ], batch size: 65, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:32:40,692 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 07:32:43,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=987090.0, ans=0.125 2024-08-11 07:33:03,547 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=12.0 2024-08-11 07:33:10,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=987190.0, ans=0.95 2024-08-11 07:33:11,573 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 07:33:19,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=987290.0, ans=0.2 2024-08-11 07:33:25,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2024-08-11 07:33:42,442 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-11 07:33:47,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=987390.0, ans=0.0 2024-08-11 07:33:48,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=987390.0, ans=10.0 2024-08-11 07:34:09,348 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11800, loss[loss=0.07531, beats_loss=0.01393, ecapa_loss=0.0002084, whisper_loss=0.05929, over 18773.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01148, ecapa_loss=0.0002037, whisper_loss=0.09463, over 3868801.52 frames. ], batch size: 78, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:34:18,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=987590.0, ans=0.1 2024-08-11 07:34:21,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=987590.0, ans=0.1 2024-08-11 07:34:23,163 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 07:34:32,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=987690.0, ans=0.125 2024-08-11 07:34:35,685 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-08-11 07:34:44,114 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-11 07:34:45,294 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 07:34:54,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=987790.0, ans=0.1 2024-08-11 07:35:03,883 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 07:35:12,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=987890.0, ans=0.0 2024-08-11 07:35:14,887 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.07 vs. limit=15.0 2024-08-11 07:35:18,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.720e+01 3.073e+01 3.423e+01 3.198e+02, threshold=6.145e+01, percent-clipped=1.0 2024-08-11 07:35:30,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=987990.0, ans=0.125 2024-08-11 07:35:31,052 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-11 07:35:36,247 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11850, loss[loss=0.1226, beats_loss=0.009256, ecapa_loss=0.0001985, whisper_loss=0.1113, over 18029.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0115, ecapa_loss=0.0002049, whisper_loss=0.09415, over 3883147.95 frames. ], batch size: 68, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:35:59,842 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 26 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-11 07:36:07,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=988190.0, ans=0.125 2024-08-11 07:36:28,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=988390.0, ans=0.02 2024-08-11 07:36:33,307 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 19 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-11 07:36:35,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=988390.0, ans=0.0 2024-08-11 07:36:44,932 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-11 07:36:49,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=988490.0, ans=0.125 2024-08-11 07:36:55,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=988490.0, ans=0.125 2024-08-11 07:37:01,829 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11900, loss[loss=0.08118, beats_loss=0.01034, ecapa_loss=0.0002478, whisper_loss=0.06836, over 13960.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01151, ecapa_loss=0.0002064, whisper_loss=0.09388, over 3904631.55 frames. ], batch size: 59, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:37:36,065 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 13 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 07:37:49,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=988890.0, ans=0.0 2024-08-11 07:37:56,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=988890.0, ans=0.125 2024-08-11 07:38:06,409 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.737e+01 3.168e+01 3.571e+01 8.955e+01, threshold=6.335e+01, percent-clipped=2.0 2024-08-11 07:38:06,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=988990.0, ans=0.125 2024-08-11 07:38:13,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=988990.0, ans=0.1 2024-08-11 07:38:14,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=988990.0, ans=0.125 2024-08-11 07:38:14,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=988990.0, ans=0.07 2024-08-11 07:38:15,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.31 vs. limit=15.0 2024-08-11 07:38:15,863 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 07:38:20,395 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 11950, loss[loss=0.09926, beats_loss=0.01269, ecapa_loss=0.0002393, whisper_loss=0.08417, over 14653.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01144, ecapa_loss=0.000208, whisper_loss=0.09414, over 3854027.28 frames. ], batch size: 59, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:38:25,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=989090.0, ans=0.0 2024-08-11 07:38:41,954 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-08-11 07:38:42,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=989190.0, ans=0.0 2024-08-11 07:38:53,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=989290.0, ans=0.125 2024-08-11 07:39:07,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=989390.0, ans=0.125 2024-08-11 07:39:10,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=989390.0, ans=0.09899494936611666 2024-08-11 07:39:22,974 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-11 07:39:37,874 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12000, loss[loss=0.08777, beats_loss=0.01188, ecapa_loss=0.0002219, whisper_loss=0.07367, over 18967.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.0116, ecapa_loss=0.0002063, whisper_loss=0.09315, over 3878309.30 frames. ], batch size: 79, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:39:37,875 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 07:40:13,133 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on ASR_libri: loss=0.2587, beats_loss=0, ecapa_loss=0.0006674, whisper_loss=0.252, over 922467.00 frames. 2024-08-11 07:40:32,415 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on SV_voxceleb1: loss=0.005495, beats_loss=0, ecapa_loss=0.0005495, whisper_loss=0, over 939242.00 frames. 2024-08-11 07:42:18,208 INFO [train_multi_KD3.py:1149] (1/4) Epoch 7, validation on AT_audioset: loss=0.02554, beats_loss=0.02554, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 07:42:18,212 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 07:42:24,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=989590.0, ans=0.125 2024-08-11 07:42:29,655 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.12 vs. limit=15.0 2024-08-11 07:42:52,997 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-11 07:43:07,574 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 07:43:09,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=989890.0, ans=0.07 2024-08-11 07:43:20,236 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 07:43:21,388 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.747e+01 3.219e+01 3.881e+01 9.695e+01, threshold=6.438e+01, percent-clipped=1.0 2024-08-11 07:43:35,546 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12050, loss[loss=0.09521, beats_loss=0.01154, ecapa_loss=0.0001937, whisper_loss=0.08174, over 17732.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01153, ecapa_loss=0.0002052, whisper_loss=0.09363, over 3865408.96 frames. ], batch size: 72, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:43:46,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=990090.0, ans=0.0 2024-08-11 07:43:52,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=990190.0, ans=0.125 2024-08-11 07:43:55,377 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 07:44:18,630 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 35 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 07:44:27,546 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-11 07:44:36,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=990490.0, ans=0.125 2024-08-11 07:44:39,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=990490.0, ans=0.2 2024-08-11 07:44:43,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=990490.0, ans=0.0 2024-08-11 07:44:50,570 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12100, loss[loss=0.123, beats_loss=0.008916, ecapa_loss=0.0001609, whisper_loss=0.1125, over 19228.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01142, ecapa_loss=0.0002067, whisper_loss=0.09452, over 3870942.64 frames. ], batch size: 71, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:45:06,468 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 07:45:18,496 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.79 vs. limit=10.0 2024-08-11 07:45:20,568 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 07:45:48,074 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 07:45:49,329 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-11 07:45:49,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=990890.0, ans=0.125 2024-08-11 07:45:54,929 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.799e+01 3.089e+01 3.650e+01 5.391e+01, threshold=6.177e+01, percent-clipped=0.0 2024-08-11 07:46:10,313 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12150, loss[loss=0.1052, beats_loss=0.01184, ecapa_loss=0.0001973, whisper_loss=0.09139, over 21493.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01151, ecapa_loss=0.0002077, whisper_loss=0.09376, over 3862469.35 frames. ], batch size: 88, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:46:18,126 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-11 07:46:26,381 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 07:46:26,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=991190.0, ans=0.125 2024-08-11 07:46:26,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=991190.0, ans=0.0 2024-08-11 07:46:31,266 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-11 07:46:33,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=991190.0, ans=0.2 2024-08-11 07:46:53,258 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.43 vs. limit=15.0 2024-08-11 07:47:16,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2024-08-11 07:47:22,722 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 14 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-11 07:47:23,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=991490.0, ans=0.125 2024-08-11 07:47:30,480 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12200, loss[loss=0.1132, beats_loss=0.01334, ecapa_loss=0.0002033, whisper_loss=0.09781, over 22134.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01148, ecapa_loss=0.0002072, whisper_loss=0.09337, over 3859605.69 frames. ], batch size: 90, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:47:37,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=991590.0, ans=0.2 2024-08-11 07:47:47,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=991690.0, ans=0.125 2024-08-11 07:47:47,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=991690.0, ans=0.1 2024-08-11 07:48:15,364 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 07:48:27,879 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2024-08-11 07:48:35,687 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.629e+01 2.882e+01 3.326e+01 5.595e+01, threshold=5.765e+01, percent-clipped=0.0 2024-08-11 07:48:40,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=991990.0, ans=0.1 2024-08-11 07:48:44,455 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2024-08-11 07:48:49,398 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12250, loss[loss=0.09834, beats_loss=0.01384, ecapa_loss=0.0001566, whisper_loss=0.08293, over 22354.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01148, ecapa_loss=0.0002068, whisper_loss=0.09363, over 3871216.81 frames. ], batch size: 89, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:49:04,167 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 07:49:08,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=992190.0, ans=0.1 2024-08-11 07:49:37,948 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-11 07:49:47,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=992390.0, ans=0.0 2024-08-11 07:50:07,033 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 07:50:07,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=992590.0, ans=0.125 2024-08-11 07:50:08,664 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12300, loss[loss=0.1149, beats_loss=0.0112, ecapa_loss=0.0001821, whisper_loss=0.1019, over 19681.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01149, ecapa_loss=0.0002076, whisper_loss=0.09365, over 3872056.17 frames. ], batch size: 75, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:50:09,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=992590.0, ans=0.125 2024-08-11 07:50:27,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=992690.0, ans=0.5 2024-08-11 07:50:50,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=992790.0, ans=0.125 2024-08-11 07:50:55,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=992890.0, ans=0.125 2024-08-11 07:50:57,485 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2024-08-11 07:51:04,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=992890.0, ans=0.125 2024-08-11 07:51:05,732 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 07:51:12,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.794e+01 3.118e+01 3.585e+01 7.136e+01, threshold=6.237e+01, percent-clipped=2.0 2024-08-11 07:51:17,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=992990.0, ans=0.1 2024-08-11 07:51:18,739 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 07:51:27,296 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12350, loss[loss=0.1111, beats_loss=0.009816, ecapa_loss=0.0002049, whisper_loss=0.09927, over 17316.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01145, ecapa_loss=0.0002076, whisper_loss=0.0937, over 3893101.41 frames. ], batch size: 70, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:51:38,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=993090.0, ans=0.2 2024-08-11 07:51:40,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=993090.0, ans=0.125 2024-08-11 07:51:43,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=993190.0, ans=0.1 2024-08-11 07:51:44,334 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 07:51:45,743 INFO [train_multi_KD3.py:844] (1/4) A total of 97 cuts. 26 from LS+wenet, 18 from Vox, 53 fro AS 2024-08-11 07:51:48,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=993190.0, ans=0.125 2024-08-11 07:52:04,374 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 07:52:08,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=993290.0, ans=0.0 2024-08-11 07:52:11,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=993390.0, ans=0.95 2024-08-11 07:52:12,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=993390.0, ans=0.125 2024-08-11 07:52:21,931 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-11 07:52:33,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=993490.0, ans=0.2 2024-08-11 07:52:41,677 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12400, loss[loss=0.1098, beats_loss=0.009749, ecapa_loss=0.0001967, whisper_loss=0.0981, over 18707.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01135, ecapa_loss=0.0002078, whisper_loss=0.09388, over 3853437.49 frames. ], batch size: 72, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:52:43,430 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 07:53:00,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=993690.0, ans=0.1 2024-08-11 07:53:47,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+01 2.944e+01 3.370e+01 3.888e+01 6.179e+01, threshold=6.739e+01, percent-clipped=0.0 2024-08-11 07:53:50,769 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:54:01,518 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12450, loss[loss=0.1243, beats_loss=0.01055, ecapa_loss=0.0001976, whisper_loss=0.1118, over 18070.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01127, ecapa_loss=0.0002091, whisper_loss=0.0941, over 3849140.03 frames. ], batch size: 72, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:54:01,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=994090.0, ans=0.125 2024-08-11 07:54:05,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=994090.0, ans=0.0 2024-08-11 07:54:07,290 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2024-08-11 07:54:30,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=994190.0, ans=0.0 2024-08-11 07:55:19,305 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12500, loss[loss=0.1144, beats_loss=0.009001, ecapa_loss=0.0002375, whisper_loss=0.1031, over 14780.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01125, ecapa_loss=0.00021, whisper_loss=0.09345, over 3822993.36 frames. ], batch size: 57, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:55:25,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=994590.0, ans=0.0 2024-08-11 07:55:25,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-11 07:55:29,728 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 07:55:55,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=994790.0, ans=0.125 2024-08-11 07:55:58,018 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 18 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-11 07:56:01,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=994790.0, ans=0.125 2024-08-11 07:56:16,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=994890.0, ans=0.125 2024-08-11 07:56:18,701 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 19 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 07:56:21,873 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 07:56:23,238 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.789e+01 3.126e+01 3.797e+01 5.980e+01, threshold=6.252e+01, percent-clipped=0.0 2024-08-11 07:56:26,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=994990.0, ans=0.125 2024-08-11 07:56:35,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=994990.0, ans=0.125 2024-08-11 07:56:37,051 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12550, loss[loss=0.1356, beats_loss=0.008294, ecapa_loss=0.0001981, whisper_loss=0.1254, over 18897.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01131, ecapa_loss=0.0002087, whisper_loss=0.09365, over 3849254.52 frames. ], batch size: 68, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:56:53,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=995190.0, ans=0.125 2024-08-11 07:56:56,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=995190.0, ans=0.125 2024-08-11 07:57:15,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=995290.0, ans=0.2 2024-08-11 07:57:21,011 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 07:57:32,062 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 07:57:42,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=995490.0, ans=0.125 2024-08-11 07:57:47,842 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 07:57:51,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=995490.0, ans=0.125 2024-08-11 07:57:56,112 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12600, loss[loss=0.09512, beats_loss=0.01142, ecapa_loss=0.000183, whisper_loss=0.08187, over 16504.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0114, ecapa_loss=0.0002084, whisper_loss=0.09344, over 3825662.29 frames. ], batch size: 61, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:58:06,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=995590.0, ans=0.125 2024-08-11 07:58:08,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=995590.0, ans=0.125 2024-08-11 07:58:18,673 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=12.0 2024-08-11 07:58:50,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=995890.0, ans=0.0 2024-08-11 07:58:51,284 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 07:59:00,673 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.576e+01 3.023e+01 3.555e+01 7.578e+01, threshold=6.047e+01, percent-clipped=3.0 2024-08-11 07:59:16,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=996090.0, ans=0.2 2024-08-11 07:59:17,603 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12650, loss[loss=0.1051, beats_loss=0.01185, ecapa_loss=0.0001886, whisper_loss=0.09132, over 17353.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01145, ecapa_loss=0.0002058, whisper_loss=0.09362, over 3804992.89 frames. ], batch size: 70, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:59:19,369 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 07:59:20,966 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-11 07:59:24,211 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.112e-03 2024-08-11 07:59:29,757 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 08:00:01,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=996290.0, ans=0.0 2024-08-11 08:00:16,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=996390.0, ans=0.0 2024-08-11 08:00:16,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=15.0 2024-08-11 08:00:20,064 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 08:00:35,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=996490.0, ans=0.125 2024-08-11 08:00:36,921 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 08:00:42,475 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12700, loss[loss=0.1239, beats_loss=0.0102, ecapa_loss=0.0001978, whisper_loss=0.1117, over 17211.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01144, ecapa_loss=0.0002051, whisper_loss=0.09408, over 3783352.66 frames. ], batch size: 68, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:01:02,986 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 26 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-11 08:01:03,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=996690.0, ans=0.125 2024-08-11 08:01:09,933 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 16 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 08:01:23,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=996790.0, ans=0.125 2024-08-11 08:01:34,756 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 08:01:44,942 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.87 vs. limit=12.0 2024-08-11 08:01:53,861 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.625e+01 2.937e+01 3.351e+01 6.413e+01, threshold=5.874e+01, percent-clipped=1.0 2024-08-11 08:02:06,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=996990.0, ans=0.1 2024-08-11 08:02:10,160 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12750, loss[loss=0.07966, beats_loss=0.01487, ecapa_loss=0.0001443, whisper_loss=0.06334, over 15919.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01145, ecapa_loss=0.0002051, whisper_loss=0.09386, over 3778633.17 frames. ], batch size: 64, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:02:16,298 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 08:02:35,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=997190.0, ans=0.05 2024-08-11 08:02:55,775 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 08:02:56,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=997290.0, ans=0.125 2024-08-11 08:03:13,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=997390.0, ans=0.0 2024-08-11 08:03:14,522 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 08:03:18,072 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 08:03:29,737 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=12.0 2024-08-11 08:03:32,711 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12800, loss[loss=0.1097, beats_loss=0.01077, ecapa_loss=0.0001708, whisper_loss=0.09725, over 16960.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01148, ecapa_loss=0.0002051, whisper_loss=0.09415, over 3794097.06 frames. ], batch size: 66, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:03:51,461 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-11 08:04:20,884 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 08:04:42,787 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.631e+01 3.014e+01 3.452e+01 5.658e+01, threshold=6.028e+01, percent-clipped=0.0 2024-08-11 08:04:45,840 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 08:04:50,941 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 08:04:56,902 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12850, loss[loss=0.1092, beats_loss=0.01087, ecapa_loss=0.0001635, whisper_loss=0.09673, over 18490.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01142, ecapa_loss=0.0002077, whisper_loss=0.09429, over 3792430.14 frames. ], batch size: 70, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:05:02,994 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 08:05:29,396 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 08:05:31,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=998290.0, ans=0.0 2024-08-11 08:05:40,733 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-11 08:05:43,465 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 08:05:59,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=998490.0, ans=0.2 2024-08-11 08:06:01,171 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 16 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 08:06:13,170 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.499e-02 2024-08-11 08:06:17,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12900, loss[loss=0.0953, beats_loss=0.0133, ecapa_loss=0.0001764, whisper_loss=0.08023, over 22820.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01146, ecapa_loss=0.0002071, whisper_loss=0.09385, over 3794165.47 frames. ], batch size: 93, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:06:39,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=998690.0, ans=0.0 2024-08-11 08:07:17,414 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 27 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-11 08:07:24,823 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.613e+01 2.962e+01 3.305e+01 5.857e+01, threshold=5.923e+01, percent-clipped=0.0 2024-08-11 08:07:42,004 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 12950, loss[loss=0.08273, beats_loss=0.0125, ecapa_loss=0.0001839, whisper_loss=0.0684, over 15666.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01139, ecapa_loss=0.0002072, whisper_loss=0.09431, over 3817470.26 frames. ], batch size: 63, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:07:49,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=999090.0, ans=0.2 2024-08-11 08:08:18,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=999290.0, ans=0.125 2024-08-11 08:08:50,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=999490.0, ans=0.0 2024-08-11 08:09:11,418 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13000, loss[loss=0.108, beats_loss=0.01071, ecapa_loss=0.0002293, whisper_loss=0.09499, over 21142.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01133, ecapa_loss=0.0002076, whisper_loss=0.09421, over 3835721.14 frames. ], batch size: 90, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:09:34,622 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 08:09:40,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-08-11 08:09:54,679 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.94 vs. limit=6.0 2024-08-11 08:10:09,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=999890.0, ans=0.125 2024-08-11 08:10:12,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=999890.0, ans=0.0 2024-08-11 08:10:24,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=999990.0, ans=0.125 2024-08-11 08:10:25,057 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.079e+01 2.746e+01 3.044e+01 3.535e+01 5.645e+01, threshold=6.088e+01, percent-clipped=0.0 2024-08-11 08:10:26,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=999990.0, ans=0.125 2024-08-11 08:10:39,312 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13050, loss[loss=0.1028, beats_loss=0.01117, ecapa_loss=0.0001974, whisper_loss=0.08971, over 22429.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01143, ecapa_loss=0.0002072, whisper_loss=0.09374, over 3856846.94 frames. ], batch size: 91, lr: 8.74e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:11:13,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1000290.0, ans=0.0 2024-08-11 08:11:13,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1000290.0, ans=0.2 2024-08-11 08:11:17,856 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 08:11:29,253 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 08:11:36,228 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-11 08:11:42,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1000490.0, ans=0.125 2024-08-11 08:11:43,268 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.31 vs. limit=6.0 2024-08-11 08:11:51,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1000490.0, ans=0.125 2024-08-11 08:11:55,737 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13100, loss[loss=0.08216, beats_loss=0.01366, ecapa_loss=0.0001712, whisper_loss=0.06679, over 16978.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01144, ecapa_loss=0.0002058, whisper_loss=0.09334, over 3842152.10 frames. ], batch size: 70, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:12:08,015 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2024-08-11 08:12:12,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1000690.0, ans=0.1 2024-08-11 08:12:30,161 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.39 vs. limit=15.0 2024-08-11 08:12:45,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1000890.0, ans=0.1 2024-08-11 08:12:54,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.920e+01 3.431e+01 3.898e+01 1.839e+02, threshold=6.862e+01, percent-clipped=3.0 2024-08-11 08:12:56,386 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 08:13:05,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1000990.0, ans=0.125 2024-08-11 08:13:07,945 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13150, loss[loss=0.1217, beats_loss=0.008554, ecapa_loss=0.0002655, whisper_loss=0.1105, over 21294.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01145, ecapa_loss=0.0002062, whisper_loss=0.09322, over 3862986.22 frames. ], batch size: 88, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:13:15,144 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 08:13:19,206 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 08:13:24,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1001190.0, ans=0.0 2024-08-11 08:13:34,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1001190.0, ans=0.0 2024-08-11 08:13:35,236 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-11 08:13:37,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1001290.0, ans=0.0 2024-08-11 08:13:37,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1001290.0, ans=0.1 2024-08-11 08:14:00,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1001390.0, ans=0.2 2024-08-11 08:14:01,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1001390.0, ans=0.0 2024-08-11 08:14:20,788 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13200, loss[loss=0.1091, beats_loss=0.01332, ecapa_loss=0.0001651, whisper_loss=0.09417, over 20472.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01144, ecapa_loss=0.0002067, whisper_loss=0.09339, over 3861579.03 frames. ], batch size: 78, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:14:23,983 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 08:14:32,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1001590.0, ans=0.1 2024-08-11 08:14:38,258 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 25 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-11 08:14:41,268 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2024-08-11 08:14:50,413 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 08:15:03,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1001790.0, ans=0.0 2024-08-11 08:15:22,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.032e+01 2.762e+01 3.091e+01 3.560e+01 4.785e+01, threshold=6.182e+01, percent-clipped=0.0 2024-08-11 08:15:24,715 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 28 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 08:15:26,086 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 08:15:36,296 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13250, loss[loss=0.09596, beats_loss=0.01254, ecapa_loss=0.0002084, whisper_loss=0.08133, over 22578.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01143, ecapa_loss=0.0002071, whisper_loss=0.09325, over 3847262.06 frames. ], batch size: 90, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:15:36,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1002090.0, ans=0.2 2024-08-11 08:15:52,468 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2024-08-11 08:16:04,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1002190.0, ans=0.0 2024-08-11 08:16:08,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1002290.0, ans=0.125 2024-08-11 08:16:10,133 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=15.0 2024-08-11 08:16:14,638 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 08:16:16,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1002290.0, ans=0.125 2024-08-11 08:16:41,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1002490.0, ans=0.125 2024-08-11 08:16:42,588 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 22 from LS+wenet, 8 from Vox, 25 fro AS 2024-08-11 08:16:44,333 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 08:16:51,611 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13300, loss[loss=0.1014, beats_loss=0.009129, ecapa_loss=0.0002516, whisper_loss=0.08979, over 16931.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0114, ecapa_loss=0.0002063, whisper_loss=0.09321, over 3850478.01 frames. ], batch size: 68, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:16:53,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1002590.0, ans=0.125 2024-08-11 08:17:04,992 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 08:17:09,414 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 08:17:16,331 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.46 vs. limit=15.0 2024-08-11 08:17:21,500 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 08:17:29,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1002790.0, ans=0.125 2024-08-11 08:17:49,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1002890.0, ans=0.1 2024-08-11 08:17:50,984 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 08:17:52,757 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2024-08-11 08:17:55,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.657e+01 3.097e+01 3.589e+01 1.012e+02, threshold=6.194e+01, percent-clipped=1.0 2024-08-11 08:18:09,750 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.86 vs. limit=15.0 2024-08-11 08:18:10,268 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13350, loss[loss=0.08075, beats_loss=0.01053, ecapa_loss=0.0002328, whisper_loss=0.0679, over 17133.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0114, ecapa_loss=0.0002069, whisper_loss=0.09267, over 3848615.20 frames. ], batch size: 70, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:18:20,241 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-11 08:18:31,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1003190.0, ans=0.0 2024-08-11 08:18:31,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1003190.0, ans=0.125 2024-08-11 08:18:37,953 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 08:18:50,959 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 08:18:57,347 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-11 08:19:09,217 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-11 08:19:20,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1003490.0, ans=0.125 2024-08-11 08:19:29,369 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13400, loss[loss=0.1063, beats_loss=0.01363, ecapa_loss=0.0002041, whisper_loss=0.09059, over 21596.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01138, ecapa_loss=0.0002085, whisper_loss=0.09243, over 3820006.64 frames. ], batch size: 87, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:19:43,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1003590.0, ans=0.0 2024-08-11 08:19:47,401 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.64 vs. limit=15.0 2024-08-11 08:19:56,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1003690.0, ans=10.0 2024-08-11 08:20:02,976 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.30 vs. limit=10.0 2024-08-11 08:20:03,719 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 08:20:11,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1003790.0, ans=0.125 2024-08-11 08:20:34,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.700e+01 3.139e+01 3.511e+01 8.019e+01, threshold=6.278e+01, percent-clipped=1.0 2024-08-11 08:20:43,182 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 08:20:47,944 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13450, loss[loss=0.09179, beats_loss=0.01339, ecapa_loss=0.0002128, whisper_loss=0.07627, over 21985.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01138, ecapa_loss=0.000208, whisper_loss=0.09264, over 3822330.56 frames. ], batch size: 90, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:20:55,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1004090.0, ans=0.125 2024-08-11 08:21:17,296 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-11 08:21:18,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2024-08-11 08:21:21,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1004290.0, ans=0.125 2024-08-11 08:21:31,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1004290.0, ans=0.125 2024-08-11 08:21:32,984 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2024-08-11 08:21:41,634 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2024-08-11 08:21:41,844 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-08-11 08:21:43,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1004390.0, ans=0.125 2024-08-11 08:21:47,045 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2024-08-11 08:22:05,959 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13500, loss[loss=0.1071, beats_loss=0.01294, ecapa_loss=0.0002096, whisper_loss=0.09211, over 21423.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01138, ecapa_loss=0.0002086, whisper_loss=0.09301, over 3820880.73 frames. ], batch size: 84, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:22:07,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1004590.0, ans=0.125 2024-08-11 08:22:11,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1004590.0, ans=0.2 2024-08-11 08:22:16,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1004590.0, ans=0.2 2024-08-11 08:22:19,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1004690.0, ans=0.125 2024-08-11 08:22:24,662 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 08:22:28,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1004690.0, ans=0.125 2024-08-11 08:23:02,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1004990.0, ans=0.125 2024-08-11 08:23:04,779 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.720e+01 3.065e+01 3.481e+01 5.636e+01, threshold=6.129e+01, percent-clipped=0.0 2024-08-11 08:23:18,556 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13550, loss[loss=0.106, beats_loss=0.01188, ecapa_loss=0.0002035, whisper_loss=0.09212, over 20290.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01136, ecapa_loss=0.0002071, whisper_loss=0.094, over 3861553.35 frames. ], batch size: 84, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:23:20,608 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2024-08-11 08:23:22,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1005090.0, ans=0.125 2024-08-11 08:23:33,253 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 17 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 08:23:35,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1005190.0, ans=0.0 2024-08-11 08:23:46,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1005290.0, ans=0.125 2024-08-11 08:23:54,941 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 08:23:56,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1005290.0, ans=0.0 2024-08-11 08:24:00,235 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-11 08:24:04,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1005390.0, ans=0.125 2024-08-11 08:24:09,520 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 08:24:24,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1005490.0, ans=0.2 2024-08-11 08:24:32,068 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13600, loss[loss=0.1164, beats_loss=0.01162, ecapa_loss=0.0002003, whisper_loss=0.1028, over 22959.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01141, ecapa_loss=0.0002052, whisper_loss=0.09385, over 3869978.88 frames. ], batch size: 92, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:24:41,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1005590.0, ans=0.125 2024-08-11 08:24:44,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1005590.0, ans=0.125 2024-08-11 08:24:45,444 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-11 08:24:58,470 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 08:25:00,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1005790.0, ans=0.125 2024-08-11 08:25:03,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1005790.0, ans=0.0 2024-08-11 08:25:31,897 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.811e+01 3.158e+01 3.669e+01 1.616e+02, threshold=6.317e+01, percent-clipped=3.0 2024-08-11 08:25:33,301 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 08:25:44,425 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13650, loss[loss=0.1036, beats_loss=0.009577, ecapa_loss=0.0002228, whisper_loss=0.09176, over 20420.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01148, ecapa_loss=0.0002051, whisper_loss=0.09292, over 3846816.92 frames. ], batch size: 81, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:26:06,027 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 35 from Vox, 32 fro AS 2024-08-11 08:26:30,468 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=12.0 2024-08-11 08:26:33,565 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 08:26:42,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1006390.0, ans=0.125 2024-08-11 08:26:44,649 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 08:26:57,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1006490.0, ans=0.2 2024-08-11 08:26:58,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1006490.0, ans=0.125 2024-08-11 08:27:00,985 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13700, loss[loss=0.08266, beats_loss=0.01306, ecapa_loss=0.0002063, whisper_loss=0.06754, over 13511.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01145, ecapa_loss=0.0002054, whisper_loss=0.09332, over 3823838.57 frames. ], batch size: 56, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:27:13,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1006590.0, ans=0.2 2024-08-11 08:27:37,112 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 08:28:02,540 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.699e+01 3.024e+01 3.641e+01 8.253e+01, threshold=6.049e+01, percent-clipped=1.0 2024-08-11 08:28:15,848 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13750, loss[loss=0.1167, beats_loss=0.01138, ecapa_loss=0.0001729, whisper_loss=0.1036, over 20988.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01138, ecapa_loss=0.0002055, whisper_loss=0.09377, over 3853484.50 frames. ], batch size: 81, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:28:17,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1007090.0, ans=0.125 2024-08-11 08:28:19,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1007090.0, ans=0.0 2024-08-11 08:28:21,921 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 08:29:17,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1007490.0, ans=0.125 2024-08-11 08:29:20,732 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 08:29:28,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1007590.0, ans=0.125 2024-08-11 08:29:30,272 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13800, loss[loss=0.1146, beats_loss=0.01298, ecapa_loss=0.0002049, whisper_loss=0.09961, over 23298.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01139, ecapa_loss=0.0002063, whisper_loss=0.09408, over 3880033.69 frames. ], batch size: 94, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:29:32,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1007590.0, ans=0.0 2024-08-11 08:29:32,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1007590.0, ans=0.125 2024-08-11 08:29:37,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1007590.0, ans=0.125 2024-08-11 08:29:48,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1007690.0, ans=0.125 2024-08-11 08:29:50,003 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 08:30:31,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1007890.0, ans=0.125 2024-08-11 08:30:35,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.572e+01 2.803e+01 3.077e+01 5.296e+01, threshold=5.605e+01, percent-clipped=0.0 2024-08-11 08:30:43,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1007990.0, ans=0.025 2024-08-11 08:30:49,644 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13850, loss[loss=0.1158, beats_loss=0.009785, ecapa_loss=0.0002037, whisper_loss=0.104, over 23395.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01136, ecapa_loss=0.0002058, whisper_loss=0.09452, over 3866106.12 frames. ], batch size: 92, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:31:01,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1008090.0, ans=0.1 2024-08-11 08:31:11,809 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 08:31:28,850 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-08-11 08:31:42,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1008390.0, ans=0.125 2024-08-11 08:31:58,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2024-08-11 08:32:01,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1008490.0, ans=0.0 2024-08-11 08:32:10,937 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13900, loss[loss=0.1157, beats_loss=0.009254, ecapa_loss=0.0002726, whisper_loss=0.1038, over 21192.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01131, ecapa_loss=0.000206, whisper_loss=0.09535, over 3914907.97 frames. ], batch size: 86, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:32:42,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1008790.0, ans=0.05 2024-08-11 08:32:55,421 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-11 08:32:58,967 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2024-08-11 08:33:14,128 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.808e+01 3.104e+01 3.560e+01 5.037e+01, threshold=6.208e+01, percent-clipped=0.0 2024-08-11 08:33:15,176 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-11 08:33:17,796 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 08:33:27,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1009090.0, ans=0.125 2024-08-11 08:33:28,061 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 13950, loss[loss=0.1175, beats_loss=0.01013, ecapa_loss=0.0002115, whisper_loss=0.1053, over 23134.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01134, ecapa_loss=0.0002051, whisper_loss=0.09487, over 3933004.01 frames. ], batch size: 91, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:33:29,696 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-11 08:33:36,115 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 08:33:36,601 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.25 vs. limit=10.0 2024-08-11 08:33:38,166 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.090e+00 2024-08-11 08:33:48,489 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=15.0 2024-08-11 08:33:52,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1009190.0, ans=15.0 2024-08-11 08:34:06,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1009290.0, ans=0.125 2024-08-11 08:34:20,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1009390.0, ans=0.125 2024-08-11 08:34:30,211 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 38 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 08:34:30,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1009490.0, ans=0.2 2024-08-11 08:34:36,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1009490.0, ans=0.0 2024-08-11 08:34:42,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1009490.0, ans=0.0 2024-08-11 08:34:48,002 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 14000, loss[loss=0.1249, beats_loss=0.009589, ecapa_loss=0.0002101, whisper_loss=0.1133, over 20420.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01133, ecapa_loss=0.0002041, whisper_loss=0.09554, over 3952832.07 frames. ], batch size: 78, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:35:10,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1009690.0, ans=0.0 2024-08-11 08:35:27,530 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.44 vs. limit=22.5 2024-08-11 08:35:50,410 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 08:35:57,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.710e+01 3.006e+01 3.538e+01 6.784e+01, threshold=6.013e+01, percent-clipped=1.0 2024-08-11 08:36:05,492 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 08:36:10,563 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 08:36:12,012 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 14050, loss[loss=0.1243, beats_loss=0.00967, ecapa_loss=0.0002162, whisper_loss=0.1125, over 18784.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01137, ecapa_loss=0.0002019, whisper_loss=0.09566, over 3899055.84 frames. ], batch size: 74, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:36:14,772 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 08:36:18,598 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 08:36:40,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1010190.0, ans=0.0 2024-08-11 08:36:55,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1010290.0, ans=0.0 2024-08-11 08:37:21,072 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 08:37:37,632 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 14100, loss[loss=0.1054, beats_loss=0.01177, ecapa_loss=0.0002236, whisper_loss=0.09135, over 22821.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01135, ecapa_loss=0.0002022, whisper_loss=0.09524, over 3891991.69 frames. ], batch size: 92, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:37:39,408 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-11 08:37:47,703 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=12.0 2024-08-11 08:37:57,965 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 08:38:07,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1010690.0, ans=0.1 2024-08-11 08:38:11,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1010790.0, ans=0.07 2024-08-11 08:38:32,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1010890.0, ans=0.0 2024-08-11 08:38:48,472 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 08:38:49,504 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.624e+01 2.945e+01 3.408e+01 4.744e+01, threshold=5.889e+01, percent-clipped=0.0 2024-08-11 08:38:59,389 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2024-08-11 08:39:05,128 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 14150, loss[loss=0.1026, beats_loss=0.01231, ecapa_loss=0.0002034, whisper_loss=0.08828, over 22773.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01137, ecapa_loss=0.0002027, whisper_loss=0.09478, over 3882656.20 frames. ], batch size: 94, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:39:05,285 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-11 08:39:12,801 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 08:39:16,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1011090.0, ans=0.125 2024-08-11 08:39:41,050 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 08:39:42,782 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=15.0 2024-08-11 08:39:54,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1011290.0, ans=0.2 2024-08-11 08:39:55,578 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 08:39:57,342 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 08:40:02,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1011390.0, ans=0.2 2024-08-11 08:40:06,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1011390.0, ans=0.125 2024-08-11 08:40:19,593 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-11 08:40:21,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1011490.0, ans=0.0 2024-08-11 08:40:22,876 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 08:40:25,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1011490.0, ans=0.125 2024-08-11 08:40:31,576 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 14200, loss[loss=0.1099, beats_loss=0.01046, ecapa_loss=0.0002022, whisper_loss=0.09743, over 19219.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01138, ecapa_loss=0.0002026, whisper_loss=0.09501, over 3906773.73 frames. ], batch size: 75, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:40:43,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1011590.0, ans=0.1 2024-08-11 08:40:53,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1011690.0, ans=0.125 2024-08-11 08:41:00,784 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 08:41:03,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1011790.0, ans=0.1 2024-08-11 08:41:10,148 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-11 08:41:13,874 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 08:42:10,570 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 08:42:12,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.734e+01 3.043e+01 3.584e+01 5.331e+01, threshold=6.086e+01, percent-clipped=0.0 2024-08-11 08:42:29,554 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 14250, loss[loss=0.1238, beats_loss=0.01083, ecapa_loss=0.0002194, whisper_loss=0.1108, over 22268.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01138, ecapa_loss=0.0002017, whisper_loss=0.0949, over 3936564.16 frames. ], batch size: 89, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:42:54,756 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 08:43:02,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1012190.0, ans=0.07 2024-08-11 08:43:36,128 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 08:43:38,214 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 15 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 08:43:57,725 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 14300, loss[loss=0.07198, beats_loss=0.01234, ecapa_loss=0.0002696, whisper_loss=0.05694, over 15115.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01142, ecapa_loss=0.0002006, whisper_loss=0.09428, over 3924140.34 frames. ], batch size: 66, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:44:03,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1012590.0, ans=0.125 2024-08-11 08:44:23,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1012690.0, ans=0.125 2024-08-11 08:44:27,914 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-11 08:44:32,571 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 08:44:42,702 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-11 08:44:44,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1012790.0, ans=0.0 2024-08-11 08:44:45,352 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 08:44:49,701 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.45 vs. limit=10.0 2024-08-11 08:45:04,476 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.62 vs. limit=22.5 2024-08-11 08:45:05,097 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.111e+01 2.720e+01 3.044e+01 3.421e+01 5.497e+01, threshold=6.088e+01, percent-clipped=0.0 2024-08-11 08:45:11,490 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 23 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-11 08:45:19,354 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 14350, loss[loss=0.1156, beats_loss=0.01006, ecapa_loss=0.0001928, whisper_loss=0.1036, over 15252.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01144, ecapa_loss=0.0002021, whisper_loss=0.09353, over 3900788.83 frames. ], batch size: 56, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:45:23,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1013090.0, ans=0.125 2024-08-11 08:45:35,453 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 21 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-11 08:45:42,120 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 08:45:43,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1013190.0, ans=0.125 2024-08-11 08:45:46,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.03 vs. limit=6.0 2024-08-11 08:45:55,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1013290.0, ans=0.0 2024-08-11 08:46:00,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1013290.0, ans=0.0 2024-08-11 08:46:09,064 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-11 08:46:24,013 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 08:46:41,045 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 14400, loss[loss=0.09406, beats_loss=0.01103, ecapa_loss=0.0002038, whisper_loss=0.08099, over 18866.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01143, ecapa_loss=0.000203, whisper_loss=0.09402, over 3911067.25 frames. ], batch size: 75, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:46:42,538 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 08:46:52,123 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 08:47:03,639 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 08:47:19,982 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 08:47:46,105 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.704e+01 3.131e+01 3.618e+01 5.413e+01, threshold=6.263e+01, percent-clipped=0.0 2024-08-11 08:48:00,473 INFO [train_multi_KD3.py:1116] (1/4) Epoch 7, batch 14450, loss[loss=0.1412, beats_loss=0.006809, ecapa_loss=0.0002337, whisper_loss=0.132, over 15709.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01143, ecapa_loss=0.0002029, whisper_loss=0.09427, over 3893085.81 frames. ], batch size: 59, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:48:15,003 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 14 from Vox, 49 fro AS 2024-08-11 08:48:17,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1014190.0, ans=0.2 2024-08-11 08:48:37,285 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 08:48:41,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-11 08:48:41,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2024-08-11 08:48:46,689 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.895e-01 2024-08-11 08:49:46,119 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 0, loss[loss=0.1071, beats_loss=0.01109, ecapa_loss=0.0002464, whisper_loss=0.09356, over 21388.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01109, ecapa_loss=0.0002464, whisper_loss=0.09356, over 21388.00 frames. ], batch size: 91, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:49:46,120 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 08:50:29,105 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on ASR_libri: loss=0.2579, beats_loss=0, ecapa_loss=0.0006499, whisper_loss=0.2514, over 922467.00 frames. 2024-08-11 08:50:45,183 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on SV_voxceleb1: loss=0.005446, beats_loss=0, ecapa_loss=0.0005446, whisper_loss=0, over 939242.00 frames. 2024-08-11 08:51:46,536 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.8307, 1.9239, 1.9444, 1.5703, 1.2913, 1.8601, 2.5306, 1.6275], device='cuda:1') 2024-08-11 08:52:49,604 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on AT_audioset: loss=0.02532, beats_loss=0.02532, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 08:52:49,607 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 08:54:00,016 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 08:54:18,608 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2024-08-11 08:54:23,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1014770.0, ans=0.07 2024-08-11 08:54:25,306 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 14 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 08:54:53,581 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 08:55:05,101 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 50, loss[loss=0.1028, beats_loss=0.01078, ecapa_loss=0.0002204, whisper_loss=0.08981, over 17497.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01072, ecapa_loss=0.000212, whisper_loss=0.09566, over 905094.31 frames. ], batch size: 67, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:55:08,631 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-11 08:55:12,128 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.434e+01 2.926e+01 3.335e+01 3.829e+01 6.583e+01, threshold=6.671e+01, percent-clipped=1.0 2024-08-11 08:55:25,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1014970.0, ans=0.0 2024-08-11 08:56:07,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1015170.0, ans=0.125 2024-08-11 08:56:15,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1015170.0, ans=0.2 2024-08-11 08:56:16,530 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 08:57:07,105 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 100, loss[loss=0.1017, beats_loss=0.009325, ecapa_loss=0.0002517, whisper_loss=0.08984, over 22204.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01083, ecapa_loss=0.0002041, whisper_loss=0.09322, over 1537391.99 frames. ], batch size: 91, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:57:09,042 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 08:57:18,713 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 33 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 08:57:31,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1015570.0, ans=0.125 2024-08-11 08:57:51,934 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 08:57:56,588 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 08:58:09,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1015670.0, ans=0.125 2024-08-11 08:58:16,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1015770.0, ans=0.05 2024-08-11 08:58:25,638 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-11 08:58:39,857 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-11 08:58:43,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1015870.0, ans=0.2 2024-08-11 08:58:58,579 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 150, loss[loss=0.1098, beats_loss=0.009082, ecapa_loss=0.0002134, whisper_loss=0.09856, over 18523.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0109, ecapa_loss=0.0002054, whisper_loss=0.09312, over 2073450.42 frames. ], batch size: 69, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:59:04,479 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 2.999e+01 3.323e+01 3.859e+01 6.934e+01, threshold=6.647e+01, percent-clipped=1.0 2024-08-11 08:59:57,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1016270.0, ans=0.125 2024-08-11 09:00:21,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1016370.0, ans=0.0 2024-08-11 09:00:24,406 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 200, loss[loss=0.1213, beats_loss=0.008451, ecapa_loss=0.0002881, whisper_loss=0.11, over 18455.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01084, ecapa_loss=0.0002058, whisper_loss=0.09336, over 2448425.11 frames. ], batch size: 75, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:00:24,531 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 19 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 09:00:29,103 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 09:00:30,525 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 09:00:57,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1016670.0, ans=0.125 2024-08-11 09:00:58,151 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 09:01:05,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1016670.0, ans=0.0 2024-08-11 09:01:15,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1016770.0, ans=0.07 2024-08-11 09:01:29,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1016870.0, ans=0.125 2024-08-11 09:01:44,105 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 250, loss[loss=0.09731, beats_loss=0.01405, ecapa_loss=0.000182, whisper_loss=0.08143, over 21841.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0109, ecapa_loss=0.0002036, whisper_loss=0.09342, over 2746613.72 frames. ], batch size: 88, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:01:48,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.577e+01 2.891e+01 3.229e+01 6.128e+01, threshold=5.781e+01, percent-clipped=0.0 2024-08-11 09:02:28,647 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 09:02:38,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1017270.0, ans=0.125 2024-08-11 09:02:41,159 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 09:02:54,311 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 09:02:59,178 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 24 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-11 09:03:01,539 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 300, loss[loss=0.1051, beats_loss=0.009165, ecapa_loss=0.0002129, whisper_loss=0.09381, over 16581.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01102, ecapa_loss=0.0002044, whisper_loss=0.09294, over 2983730.85 frames. ], batch size: 63, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:03:25,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1017570.0, ans=0.0 2024-08-11 09:03:29,092 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-11 09:03:43,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1017670.0, ans=0.0 2024-08-11 09:03:51,213 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.33 vs. limit=15.0 2024-08-11 09:04:12,210 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-11 09:04:17,358 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 350, loss[loss=0.1137, beats_loss=0.01196, ecapa_loss=0.0001629, whisper_loss=0.1001, over 20645.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01115, ecapa_loss=0.0002023, whisper_loss=0.09175, over 3163030.48 frames. ], batch size: 78, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:04:20,954 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 19 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 09:04:22,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.490e+01 2.836e+01 3.239e+01 6.329e+01, threshold=5.671e+01, percent-clipped=2.0 2024-08-11 09:04:37,078 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-11 09:04:38,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1018070.0, ans=0.0 2024-08-11 09:04:57,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1018170.0, ans=0.0 2024-08-11 09:05:08,089 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 09:05:23,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1018370.0, ans=0.1 2024-08-11 09:05:33,330 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 400, loss[loss=0.1217, beats_loss=0.01194, ecapa_loss=0.000197, whisper_loss=0.1078, over 14793.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01121, ecapa_loss=0.0002008, whisper_loss=0.09208, over 3284426.47 frames. ], batch size: 59, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:05:33,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1018470.0, ans=0.125 2024-08-11 09:05:44,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1018470.0, ans=0.125 2024-08-11 09:05:57,648 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-11 09:06:07,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.56 vs. limit=22.5 2024-08-11 09:06:23,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1018770.0, ans=0.125 2024-08-11 09:06:27,404 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-11 09:06:46,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1018870.0, ans=0.125 2024-08-11 09:06:51,075 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 450, loss[loss=0.1013, beats_loss=0.01272, ecapa_loss=0.0001746, whisper_loss=0.08682, over 22961.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0113, ecapa_loss=0.0001991, whisper_loss=0.0917, over 3391445.57 frames. ], batch size: 92, lr: 8.15e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:06:55,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.612e+01 2.893e+01 3.369e+01 4.521e+01, threshold=5.785e+01, percent-clipped=0.0 2024-08-11 09:07:09,180 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 09:07:09,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1019070.0, ans=0.125 2024-08-11 09:07:43,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1019270.0, ans=0.125 2024-08-11 09:07:46,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1019270.0, ans=0.125 2024-08-11 09:07:54,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1019370.0, ans=0.0 2024-08-11 09:08:08,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1019470.0, ans=0.125 2024-08-11 09:08:09,835 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 500, loss[loss=0.1029, beats_loss=0.009415, ecapa_loss=0.0001993, whisper_loss=0.09146, over 18779.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01119, ecapa_loss=0.0001981, whisper_loss=0.09265, over 3487421.00 frames. ], batch size: 73, lr: 8.15e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:08:15,548 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 09:08:18,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1019470.0, ans=0.1 2024-08-11 09:08:31,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1019570.0, ans=0.2 2024-08-11 09:08:38,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1019570.0, ans=0.2 2024-08-11 09:08:40,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1019570.0, ans=0.0 2024-08-11 09:08:50,965 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=15.0 2024-08-11 09:08:51,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1019670.0, ans=0.2 2024-08-11 09:08:54,384 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 09:09:03,228 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 09:09:12,945 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 38 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-11 09:09:13,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1019770.0, ans=0.0 2024-08-11 09:09:13,514 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-08-11 09:09:18,723 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 09:09:32,372 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 550, loss[loss=0.09946, beats_loss=0.01121, ecapa_loss=0.0001976, whisper_loss=0.08627, over 18970.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.0111, ecapa_loss=0.0001974, whisper_loss=0.09373, over 3561599.37 frames. ], batch size: 74, lr: 8.15e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:09:37,523 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.649e+01 3.106e+01 3.487e+01 7.469e+01, threshold=6.212e+01, percent-clipped=4.0 2024-08-11 09:09:49,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1020070.0, ans=0.125 2024-08-11 09:09:53,220 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-11 09:10:01,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1020170.0, ans=0.125 2024-08-11 09:10:05,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1020170.0, ans=0.125 2024-08-11 09:10:05,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1020170.0, ans=0.125 2024-08-11 09:10:13,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1020170.0, ans=0.125 2024-08-11 09:10:16,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1020270.0, ans=0.125 2024-08-11 09:10:34,920 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 09:10:39,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2024-08-11 09:10:41,841 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-11 09:10:47,472 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 600, loss[loss=0.1045, beats_loss=0.01186, ecapa_loss=0.000235, whisper_loss=0.09025, over 18191.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01117, ecapa_loss=0.000197, whisper_loss=0.09374, over 3619375.66 frames. ], batch size: 80, lr: 8.15e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:11:29,122 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.81 vs. limit=15.0 2024-08-11 09:11:33,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1020770.0, ans=0.125 2024-08-11 09:11:35,549 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=22.5 2024-08-11 09:11:36,551 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 39 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 09:11:50,757 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 09:12:00,819 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-11 09:12:01,396 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-11 09:12:06,090 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 650, loss[loss=0.1218, beats_loss=0.009271, ecapa_loss=0.0002192, whisper_loss=0.1103, over 22904.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01124, ecapa_loss=0.0001978, whisper_loss=0.09295, over 3669388.04 frames. ], batch size: 91, lr: 8.15e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:12:10,687 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.651e+01 2.850e+01 3.204e+01 4.737e+01, threshold=5.700e+01, percent-clipped=0.0 2024-08-11 09:12:12,385 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 09:12:20,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1021070.0, ans=0.035 2024-08-11 09:12:23,212 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 09:12:41,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1021170.0, ans=0.0 2024-08-11 09:12:47,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1021170.0, ans=0.0 2024-08-11 09:12:47,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1021170.0, ans=0.125 2024-08-11 09:13:04,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1021270.0, ans=0.1 2024-08-11 09:13:16,383 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.32 vs. limit=22.5 2024-08-11 09:13:21,839 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 700, loss[loss=0.08091, beats_loss=0.01032, ecapa_loss=0.0002301, whisper_loss=0.06829, over 16638.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01119, ecapa_loss=0.0001975, whisper_loss=0.09361, over 3701576.74 frames. ], batch size: 66, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:13:22,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1021470.0, ans=0.09899494936611666 2024-08-11 09:13:32,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1021470.0, ans=0.0 2024-08-11 09:13:42,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1021570.0, ans=10.0 2024-08-11 09:13:44,935 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 09:14:03,168 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 09:14:10,453 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 09:14:26,182 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-11 09:14:28,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1021870.0, ans=0.125 2024-08-11 09:14:30,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1021870.0, ans=0.0 2024-08-11 09:14:37,669 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 750, loss[loss=0.08842, beats_loss=0.01263, ecapa_loss=0.0001979, whisper_loss=0.07381, over 16624.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01116, ecapa_loss=0.0001971, whisper_loss=0.09364, over 3708165.70 frames. ], batch size: 64, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:14:42,543 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.660e+01 3.127e+01 3.627e+01 6.783e+01, threshold=6.254e+01, percent-clipped=6.0 2024-08-11 09:14:46,286 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=12.0 2024-08-11 09:14:57,172 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=15.0 2024-08-11 09:15:48,600 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 09:15:51,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1022370.0, ans=0.0 2024-08-11 09:15:54,540 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 800, loss[loss=0.07761, beats_loss=0.01289, ecapa_loss=0.0002231, whisper_loss=0.06249, over 19226.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01113, ecapa_loss=0.0001974, whisper_loss=0.09333, over 3741556.45 frames. ], batch size: 84, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:15:59,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1022470.0, ans=0.1 2024-08-11 09:16:01,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1022470.0, ans=0.0 2024-08-11 09:16:01,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.82 vs. limit=8.0 2024-08-11 09:16:04,149 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-11 09:16:08,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1022570.0, ans=0.025 2024-08-11 09:16:08,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1022570.0, ans=0.0 2024-08-11 09:16:24,771 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-11 09:16:44,298 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 09:16:59,477 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.80 vs. limit=15.0 2024-08-11 09:17:00,762 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=15.0 2024-08-11 09:17:07,240 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 850, loss[loss=0.09337, beats_loss=0.01313, ecapa_loss=0.0001682, whisper_loss=0.07856, over 23359.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01117, ecapa_loss=0.0001972, whisper_loss=0.09229, over 3731476.38 frames. ], batch size: 95, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:17:11,441 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.661e+01 2.916e+01 3.361e+01 8.910e+01, threshold=5.831e+01, percent-clipped=1.0 2024-08-11 09:17:16,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1022970.0, ans=0.125 2024-08-11 09:17:27,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1023070.0, ans=10.0 2024-08-11 09:17:29,042 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 09:17:35,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1023170.0, ans=0.125 2024-08-11 09:17:38,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1023170.0, ans=0.0 2024-08-11 09:17:41,411 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 09:17:47,455 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 26 from LS+wenet, 14 from Vox, 16 fro AS 2024-08-11 09:17:48,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1023170.0, ans=0.0 2024-08-11 09:17:56,999 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-11 09:17:57,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1023270.0, ans=0.1 2024-08-11 09:18:03,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1023270.0, ans=0.125 2024-08-11 09:18:03,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1023270.0, ans=0.0 2024-08-11 09:18:13,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1023370.0, ans=0.125 2024-08-11 09:18:15,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1023370.0, ans=0.125 2024-08-11 09:18:21,949 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 900, loss[loss=0.1054, beats_loss=0.01237, ecapa_loss=0.0001531, whisper_loss=0.09155, over 23746.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01112, ecapa_loss=0.0001969, whisper_loss=0.09247, over 3741176.27 frames. ], batch size: 92, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:18:22,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1023470.0, ans=0.05 2024-08-11 09:18:26,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1023470.0, ans=0.125 2024-08-11 09:18:30,076 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2024-08-11 09:18:39,015 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 09:18:39,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1023570.0, ans=0.0 2024-08-11 09:18:43,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1023570.0, ans=0.0 2024-08-11 09:18:46,610 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 09:18:49,937 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 19 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-11 09:18:52,047 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 32 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 09:18:55,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1023670.0, ans=0.015 2024-08-11 09:19:03,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1023670.0, ans=0.125 2024-08-11 09:19:04,671 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 20 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-11 09:19:21,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1023770.0, ans=0.125 2024-08-11 09:19:27,265 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 20 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-11 09:19:31,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1023870.0, ans=0.1 2024-08-11 09:19:33,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1023870.0, ans=10.0 2024-08-11 09:19:36,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 950, loss[loss=0.08952, beats_loss=0.01394, ecapa_loss=0.0001709, whisper_loss=0.07386, over 19905.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01121, ecapa_loss=0.0001952, whisper_loss=0.09196, over 3761194.84 frames. ], batch size: 79, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:19:39,198 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 09:19:40,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.622e+01 2.876e+01 3.425e+01 6.209e+01, threshold=5.753e+01, percent-clipped=1.0 2024-08-11 09:19:43,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1023970.0, ans=0.125 2024-08-11 09:20:14,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1024170.0, ans=0.2 2024-08-11 09:20:17,412 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 09:20:50,796 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2024-08-11 09:20:55,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1024370.0, ans=0.125 2024-08-11 09:21:00,734 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1000, loss[loss=0.116, beats_loss=0.01246, ecapa_loss=0.0001689, whisper_loss=0.1019, over 20567.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0112, ecapa_loss=0.0001949, whisper_loss=0.09304, over 3800977.70 frames. ], batch size: 82, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:21:41,432 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 09:21:48,946 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-11 09:21:57,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1024770.0, ans=0.2 2024-08-11 09:22:03,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1024770.0, ans=0.0 2024-08-11 09:22:07,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1024770.0, ans=0.0 2024-08-11 09:22:12,906 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 09:22:24,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1024870.0, ans=0.125 2024-08-11 09:22:32,379 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1050, loss[loss=0.1018, beats_loss=0.01187, ecapa_loss=0.0002011, whisper_loss=0.08795, over 23081.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01124, ecapa_loss=0.0001936, whisper_loss=0.09314, over 3821516.94 frames. ], batch size: 92, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:22:33,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1024970.0, ans=0.125 2024-08-11 09:22:39,273 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.754e+01 3.061e+01 3.548e+01 9.955e+01, threshold=6.122e+01, percent-clipped=1.0 2024-08-11 09:22:49,547 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 09:23:12,131 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 09:23:13,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1025170.0, ans=0.0 2024-08-11 09:23:28,712 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 09:23:36,919 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-11 09:23:47,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1025270.0, ans=0.125 2024-08-11 09:24:01,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1025370.0, ans=0.1 2024-08-11 09:24:13,017 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 09:24:21,613 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1100, loss[loss=0.1091, beats_loss=0.009396, ecapa_loss=0.0002092, whisper_loss=0.09762, over 16794.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01117, ecapa_loss=0.0001938, whisper_loss=0.09337, over 3813064.97 frames. ], batch size: 62, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:24:21,807 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-11 09:24:39,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1025570.0, ans=0.1 2024-08-11 09:24:55,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1025570.0, ans=0.125 2024-08-11 09:25:10,477 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 09:25:27,204 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 09:26:08,729 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1150, loss[loss=0.09918, beats_loss=0.01248, ecapa_loss=0.0001523, whisper_loss=0.08518, over 16097.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0112, ecapa_loss=0.0001934, whisper_loss=0.09389, over 3808438.52 frames. ], batch size: 62, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:26:13,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1025970.0, ans=0.0 2024-08-11 09:26:14,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.696e+01 3.045e+01 3.408e+01 7.482e+01, threshold=6.090e+01, percent-clipped=2.0 2024-08-11 09:26:14,505 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 09:26:19,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1025970.0, ans=0.125 2024-08-11 09:26:20,609 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 09:26:38,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1026070.0, ans=0.0 2024-08-11 09:27:14,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1026270.0, ans=0.1 2024-08-11 09:27:31,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1026370.0, ans=0.125 2024-08-11 09:27:54,676 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1200, loss[loss=0.09928, beats_loss=0.01071, ecapa_loss=0.0002038, whisper_loss=0.08653, over 15255.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01126, ecapa_loss=0.0001932, whisper_loss=0.0937, over 3813015.60 frames. ], batch size: 61, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:28:01,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1026470.0, ans=0.025 2024-08-11 09:28:13,527 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=15.0 2024-08-11 09:28:14,440 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 09:28:35,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1026670.0, ans=0.0 2024-08-11 09:28:47,961 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 11 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 09:28:53,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1026770.0, ans=0.125 2024-08-11 09:28:59,012 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-11 09:29:01,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1026870.0, ans=0.0 2024-08-11 09:29:09,765 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.024e-01 2024-08-11 09:29:12,872 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1250, loss[loss=0.1038, beats_loss=0.01287, ecapa_loss=0.0001513, whisper_loss=0.08947, over 16015.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01132, ecapa_loss=0.0001938, whisper_loss=0.09238, over 3787452.61 frames. ], batch size: 60, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:29:13,356 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.449e+05 2024-08-11 09:29:17,177 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.549e+01 2.780e+01 3.273e+01 6.263e+01, threshold=5.560e+01, percent-clipped=1.0 2024-08-11 09:29:46,646 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 09:29:48,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1027170.0, ans=0.0 2024-08-11 09:29:51,477 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 16 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 09:29:59,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1027270.0, ans=0.1 2024-08-11 09:30:07,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1027270.0, ans=0.0 2024-08-11 09:30:09,598 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.62 vs. limit=15.0 2024-08-11 09:30:15,333 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 09:30:15,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1027370.0, ans=0.125 2024-08-11 09:30:20,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1027370.0, ans=0.125 2024-08-11 09:30:27,283 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1300, loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001929, whisper_loss=0.08979, over 20201.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01131, ecapa_loss=0.000194, whisper_loss=0.09221, over 3798271.67 frames. ], batch size: 81, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:30:38,250 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 09:30:46,974 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-11 09:30:51,220 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 09:31:06,009 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 09:31:11,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1027670.0, ans=0.125 2024-08-11 09:31:21,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1027770.0, ans=0.0 2024-08-11 09:31:27,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1027770.0, ans=0.125 2024-08-11 09:31:44,825 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1350, loss[loss=0.1178, beats_loss=0.009235, ecapa_loss=0.0001877, whisper_loss=0.1066, over 18928.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01131, ecapa_loss=0.0001932, whisper_loss=0.09243, over 3815387.11 frames. ], batch size: 73, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:31:46,721 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 09:31:49,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.558e+01 2.922e+01 3.559e+01 4.960e+01, threshold=5.843e+01, percent-clipped=0.0 2024-08-11 09:32:20,034 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 09:32:24,325 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 09:32:27,051 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-11 09:32:29,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1028270.0, ans=0.1 2024-08-11 09:32:32,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1028270.0, ans=0.0 2024-08-11 09:32:51,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1028370.0, ans=0.125 2024-08-11 09:32:59,710 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1400, loss[loss=0.1, beats_loss=0.0103, ecapa_loss=0.0002358, whisper_loss=0.08737, over 16654.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01119, ecapa_loss=0.0001937, whisper_loss=0.09271, over 3810525.05 frames. ], batch size: 72, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:33:10,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1028470.0, ans=0.125 2024-08-11 09:33:12,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1028470.0, ans=0.035 2024-08-11 09:33:20,077 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-11 09:33:20,929 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 09:33:50,432 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 09:33:51,102 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.19 vs. limit=6.0 2024-08-11 09:33:51,800 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 09:34:06,958 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 09:34:28,050 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1450, loss[loss=0.1351, beats_loss=0.009031, ecapa_loss=0.0001718, whisper_loss=0.1244, over 16394.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0112, ecapa_loss=0.0001934, whisper_loss=0.09248, over 3801973.42 frames. ], batch size: 59, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:34:33,022 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.516e+01 2.871e+01 3.149e+01 4.386e+01, threshold=5.743e+01, percent-clipped=0.0 2024-08-11 09:34:50,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1029070.0, ans=0.125 2024-08-11 09:34:54,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1029070.0, ans=0.125 2024-08-11 09:35:06,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1029170.0, ans=0.07 2024-08-11 09:35:35,643 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-11 09:35:40,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1029370.0, ans=0.04949747468305833 2024-08-11 09:35:45,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1029370.0, ans=0.0 2024-08-11 09:35:48,020 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1500, loss[loss=0.1133, beats_loss=0.01057, ecapa_loss=0.0002047, whisper_loss=0.1007, over 19616.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01121, ecapa_loss=0.0001934, whisper_loss=0.09224, over 3803414.94 frames. ], batch size: 77, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:36:02,109 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 17 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-11 09:36:02,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1029470.0, ans=0.125 2024-08-11 09:36:03,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1029570.0, ans=10.0 2024-08-11 09:36:07,187 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-08-11 09:36:11,242 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 09:36:25,727 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 09:36:34,146 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 09:36:41,777 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 09:36:43,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1029770.0, ans=0.1 2024-08-11 09:36:50,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1029870.0, ans=0.125 2024-08-11 09:36:52,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1029870.0, ans=0.1 2024-08-11 09:36:56,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1029870.0, ans=0.0 2024-08-11 09:37:07,482 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1550, loss[loss=0.07988, beats_loss=0.01238, ecapa_loss=0.0002363, whisper_loss=0.06514, over 17179.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01134, ecapa_loss=0.000192, whisper_loss=0.09156, over 3803646.61 frames. ], batch size: 72, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:37:11,896 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.727e+01 2.976e+01 3.507e+01 6.642e+01, threshold=5.952e+01, percent-clipped=2.0 2024-08-11 09:37:14,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1029970.0, ans=0.1 2024-08-11 09:37:17,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1029970.0, ans=0.1 2024-08-11 09:37:32,928 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-11 09:37:53,318 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 09:38:01,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1030270.0, ans=0.125 2024-08-11 09:38:04,332 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 09:38:12,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=1030370.0, ans=0.2 2024-08-11 09:38:14,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1030370.0, ans=0.0 2024-08-11 09:38:16,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1030370.0, ans=0.0 2024-08-11 09:38:20,224 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 11 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 09:38:26,103 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1600, loss[loss=0.08701, beats_loss=0.01297, ecapa_loss=0.0002157, whisper_loss=0.07188, over 18921.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01138, ecapa_loss=0.0001906, whisper_loss=0.09149, over 3807580.42 frames. ], batch size: 80, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:38:36,457 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 10 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 09:38:44,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1030570.0, ans=0.0 2024-08-11 09:38:55,363 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 09:38:55,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1030570.0, ans=0.125 2024-08-11 09:39:04,697 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.53 vs. limit=10.0 2024-08-11 09:39:07,304 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-11 09:39:26,914 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-11 09:39:31,337 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 09:39:43,746 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1650, loss[loss=0.1372, beats_loss=0.008839, ecapa_loss=0.0001849, whisper_loss=0.1265, over 21588.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01125, ecapa_loss=0.0001914, whisper_loss=0.09278, over 3824511.71 frames. ], batch size: 81, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:39:48,492 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.611e+01 2.904e+01 3.448e+01 5.228e+01, threshold=5.808e+01, percent-clipped=0.0 2024-08-11 09:39:52,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.74 vs. limit=15.0 2024-08-11 09:39:59,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1031070.0, ans=0.1 2024-08-11 09:40:09,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1031070.0, ans=0.0 2024-08-11 09:40:16,884 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=12.0 2024-08-11 09:40:26,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1031270.0, ans=0.125 2024-08-11 09:40:56,745 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 09:40:57,851 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1700, loss[loss=0.09555, beats_loss=0.01255, ecapa_loss=0.000152, whisper_loss=0.08148, over 17727.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01121, ecapa_loss=0.0001915, whisper_loss=0.093, over 3831769.92 frames. ], batch size: 66, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:41:07,095 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.11 vs. limit=12.0 2024-08-11 09:41:19,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1031570.0, ans=0.0 2024-08-11 09:41:33,394 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 09:41:39,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1031770.0, ans=0.125 2024-08-11 09:41:40,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1031770.0, ans=0.2 2024-08-11 09:41:46,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1031770.0, ans=0.125 2024-08-11 09:42:09,263 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1750, loss[loss=0.08774, beats_loss=0.01189, ecapa_loss=0.0001386, whisper_loss=0.07446, over 15458.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01124, ecapa_loss=0.0001923, whisper_loss=0.09266, over 3806827.21 frames. ], batch size: 56, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:42:13,429 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.694e+01 3.096e+01 3.648e+01 5.495e+01, threshold=6.193e+01, percent-clipped=0.0 2024-08-11 09:42:13,623 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 09:42:16,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1031970.0, ans=0.0 2024-08-11 09:42:25,125 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 09:42:28,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1032070.0, ans=0.0 2024-08-11 09:42:29,036 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 09:43:00,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=15.0 2024-08-11 09:43:11,993 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 09:43:13,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1032370.0, ans=0.09899494936611666 2024-08-11 09:43:15,794 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 09:43:21,901 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1800, loss[loss=0.08547, beats_loss=0.01456, ecapa_loss=0.000163, whisper_loss=0.06929, over 18688.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0113, ecapa_loss=0.000192, whisper_loss=0.0917, over 3826569.41 frames. ], batch size: 76, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:43:57,351 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2024-08-11 09:44:14,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.34 vs. limit=22.5 2024-08-11 09:44:19,325 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.94 vs. limit=22.5 2024-08-11 09:44:35,043 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1850, loss[loss=0.1212, beats_loss=0.007393, ecapa_loss=0.0002527, whisper_loss=0.1113, over 16041.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01123, ecapa_loss=0.0001924, whisper_loss=0.092, over 3812282.82 frames. ], batch size: 61, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:44:39,689 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.564e+01 2.931e+01 3.381e+01 4.621e+01, threshold=5.861e+01, percent-clipped=0.0 2024-08-11 09:44:50,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1033070.0, ans=0.125 2024-08-11 09:45:00,771 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.13 vs. limit=15.0 2024-08-11 09:45:12,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1033170.0, ans=0.125 2024-08-11 09:45:36,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1033370.0, ans=0.125 2024-08-11 09:45:39,146 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.49 vs. limit=15.0 2024-08-11 09:45:41,972 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-11 09:45:47,078 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1900, loss[loss=0.1092, beats_loss=0.01129, ecapa_loss=0.0001763, whisper_loss=0.09616, over 20067.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01125, ecapa_loss=0.0001933, whisper_loss=0.09161, over 3803457.46 frames. ], batch size: 75, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:46:00,951 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 09:46:02,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1033570.0, ans=0.125 2024-08-11 09:46:05,360 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2024-08-11 09:46:06,673 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 09:46:18,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1033670.0, ans=0.1 2024-08-11 09:46:19,732 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 09:46:27,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1033670.0, ans=0.0 2024-08-11 09:46:29,949 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 09:46:41,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1033770.0, ans=0.5 2024-08-11 09:47:00,652 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 1950, loss[loss=0.1287, beats_loss=0.01105, ecapa_loss=0.0001869, whisper_loss=0.1158, over 23335.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01118, ecapa_loss=0.0001947, whisper_loss=0.09191, over 3809725.04 frames. ], batch size: 90, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:47:01,504 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2024-08-11 09:47:05,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.682e+01 2.998e+01 3.589e+01 5.098e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 09:47:05,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=1033970.0, ans=0.2 2024-08-11 09:47:13,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=1034070.0, ans=10.0 2024-08-11 09:47:29,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1034170.0, ans=0.125 2024-08-11 09:47:35,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1034170.0, ans=0.2 2024-08-11 09:47:42,863 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.945e-02 2024-08-11 09:47:55,089 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 09:48:07,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=15.0 2024-08-11 09:48:13,520 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2000, loss[loss=0.1029, beats_loss=0.01081, ecapa_loss=0.0001738, whisper_loss=0.09038, over 22032.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01122, ecapa_loss=0.0001954, whisper_loss=0.09232, over 3832381.99 frames. ], batch size: 84, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:48:17,760 WARNING [optim.py:496] (1/4) Scaling gradients by 0.059571195393800735, model_norm_threshold=59.96577072143555 2024-08-11 09:48:17,968 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.97, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.877e+05, grad_sumsq=1.108e+05, orig_rms_sq=8.917e+00 2024-08-11 09:48:24,521 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-11 09:48:38,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1034570.0, ans=0.1 2024-08-11 09:48:41,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1034570.0, ans=0.2 2024-08-11 09:49:27,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2050, loss[loss=0.1084, beats_loss=0.01134, ecapa_loss=0.0001941, whisper_loss=0.09513, over 22363.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01131, ecapa_loss=0.0001966, whisper_loss=0.09132, over 3819711.09 frames. ], batch size: 87, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:49:31,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.681e+01 2.944e+01 3.350e+01 1.007e+03, threshold=5.888e+01, percent-clipped=2.0 2024-08-11 09:49:37,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1034970.0, ans=0.1 2024-08-11 09:49:38,828 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-11 09:49:39,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1034970.0, ans=0.1 2024-08-11 09:49:58,880 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 09:50:04,287 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.904e+02 2024-08-11 09:50:08,677 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 24 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-11 09:50:10,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1035270.0, ans=0.125 2024-08-11 09:50:17,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1035270.0, ans=0.1 2024-08-11 09:50:19,472 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 09:50:19,982 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-08-11 09:50:26,143 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 09:50:38,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1035370.0, ans=0.0 2024-08-11 09:50:40,728 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2100, loss[loss=0.099, beats_loss=0.01122, ecapa_loss=0.0002502, whisper_loss=0.08528, over 14822.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0113, ecapa_loss=0.0001964, whisper_loss=0.09158, over 3805169.06 frames. ], batch size: 62, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:50:50,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1035470.0, ans=0.0 2024-08-11 09:51:07,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1035570.0, ans=0.0 2024-08-11 09:51:13,916 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-11 09:51:18,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1035670.0, ans=0.0 2024-08-11 09:51:34,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1035770.0, ans=0.2 2024-08-11 09:51:39,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1035870.0, ans=0.125 2024-08-11 09:51:43,067 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 09:51:45,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1035870.0, ans=0.025 2024-08-11 09:51:54,029 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2150, loss[loss=0.1232, beats_loss=0.009329, ecapa_loss=0.0002151, whisper_loss=0.1117, over 21108.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01132, ecapa_loss=0.0001969, whisper_loss=0.09198, over 3845841.29 frames. ], batch size: 84, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:51:58,117 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.546e+01 2.848e+01 3.381e+01 6.507e+01, threshold=5.695e+01, percent-clipped=3.0 2024-08-11 09:51:59,796 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 09:52:03,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1035970.0, ans=0.125 2024-08-11 09:52:20,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1036070.0, ans=0.125 2024-08-11 09:52:56,843 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 32 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-11 09:53:06,853 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2200, loss[loss=0.129, beats_loss=0.01009, ecapa_loss=0.0002276, whisper_loss=0.1166, over 19640.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01129, ecapa_loss=0.0001973, whisper_loss=0.09289, over 3825887.99 frames. ], batch size: 78, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:53:07,051 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 09:53:15,188 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.00 vs. limit=6.0 2024-08-11 09:53:31,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1036570.0, ans=0.125 2024-08-11 09:53:36,640 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2024-08-11 09:53:39,348 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 09:53:39,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1036670.0, ans=0.1 2024-08-11 09:53:51,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1036770.0, ans=0.125 2024-08-11 09:54:02,188 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 09:54:07,987 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-11 09:54:15,538 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2250, loss[loss=0.1288, beats_loss=0.01016, ecapa_loss=0.0002071, whisper_loss=0.1166, over 22069.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01131, ecapa_loss=0.000198, whisper_loss=0.09331, over 3846923.97 frames. ], batch size: 87, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:54:19,421 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.681e+01 2.914e+01 3.367e+01 5.391e+01, threshold=5.828e+01, percent-clipped=0.0 2024-08-11 09:54:22,238 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=12.0 2024-08-11 09:54:31,660 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=22.5 2024-08-11 09:54:48,312 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 09:54:51,363 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2024-08-11 09:54:53,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1037170.0, ans=0.04949747468305833 2024-08-11 09:54:54,829 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.208e+00 2024-08-11 09:55:16,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1037370.0, ans=0.125 2024-08-11 09:55:17,942 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 11 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 09:55:21,538 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2300, loss[loss=0.1307, beats_loss=0.008001, ecapa_loss=0.000193, whisper_loss=0.1208, over 16712.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01135, ecapa_loss=0.000199, whisper_loss=0.09341, over 3842108.73 frames. ], batch size: 59, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:55:21,784 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-11 09:55:24,328 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 09:55:38,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1037570.0, ans=0.0 2024-08-11 09:56:09,066 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 09:56:14,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1037870.0, ans=0.2 2024-08-11 09:56:25,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1037870.0, ans=0.125 2024-08-11 09:56:26,711 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2024-08-11 09:56:27,135 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2350, loss[loss=0.1102, beats_loss=0.01193, ecapa_loss=0.0002114, whisper_loss=0.09619, over 21087.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01138, ecapa_loss=0.0001999, whisper_loss=0.0934, over 3813983.40 frames. ], batch size: 83, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:56:31,686 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.661e+01 3.016e+01 3.402e+01 1.211e+02, threshold=6.032e+01, percent-clipped=3.0 2024-08-11 09:56:35,810 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 09:56:44,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1038070.0, ans=0.125 2024-08-11 09:56:48,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1038070.0, ans=0.125 2024-08-11 09:57:00,635 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 09:57:04,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1038170.0, ans=10.0 2024-08-11 09:57:06,978 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 09:57:07,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1038270.0, ans=0.0 2024-08-11 09:57:23,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1038370.0, ans=0.5 2024-08-11 09:57:33,213 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2400, loss[loss=0.1078, beats_loss=0.01154, ecapa_loss=0.0002213, whisper_loss=0.09407, over 15592.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01138, ecapa_loss=0.0001988, whisper_loss=0.09361, over 3824323.71 frames. ], batch size: 65, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:57:39,126 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2024-08-11 09:57:42,249 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 09:57:46,125 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 09:58:14,071 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 09:58:17,377 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-11 09:58:23,568 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 27 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 09:58:31,032 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 09:58:39,139 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2450, loss[loss=0.0904, beats_loss=0.01197, ecapa_loss=0.0001805, whisper_loss=0.07663, over 18245.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01127, ecapa_loss=0.0001988, whisper_loss=0.09367, over 3842522.20 frames. ], batch size: 72, lr: 8.07e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:58:43,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.701e+01 2.979e+01 3.423e+01 5.204e+01, threshold=5.958e+01, percent-clipped=0.0 2024-08-11 09:58:45,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1038970.0, ans=0.125 2024-08-11 09:58:48,408 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 09:58:49,715 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 09:59:11,213 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2024-08-11 09:59:16,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1039170.0, ans=0.125 2024-08-11 09:59:25,741 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2024-08-11 09:59:29,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1039270.0, ans=0.125 2024-08-11 09:59:44,191 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2500, loss[loss=0.1257, beats_loss=0.009311, ecapa_loss=0.0001868, whisper_loss=0.1145, over 17814.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01122, ecapa_loss=0.000199, whisper_loss=0.09422, over 3860800.48 frames. ], batch size: 65, lr: 8.07e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:59:45,701 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 09:59:48,189 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-11 09:59:57,790 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.081e-03 2024-08-11 10:00:17,200 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 10:00:20,480 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.28 vs. limit=15.0 2024-08-11 10:00:26,219 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 10:00:33,081 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 10:00:38,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1039870.0, ans=0.125 2024-08-11 10:00:46,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.09 vs. limit=22.5 2024-08-11 10:00:49,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2550, loss[loss=0.09866, beats_loss=0.01071, ecapa_loss=0.0002114, whisper_loss=0.08584, over 21401.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01126, ecapa_loss=0.0001981, whisper_loss=0.094, over 3871969.68 frames. ], batch size: 86, lr: 8.07e-03, grad_scale: 7.205759403792794e+16 2024-08-11 10:00:50,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1039970.0, ans=0.0 2024-08-11 10:00:57,235 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.767e+01 3.292e+01 3.693e+01 5.376e+01, threshold=6.584e+01, percent-clipped=0.0 2024-08-11 10:01:07,167 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.83 vs. limit=12.0 2024-08-11 10:01:09,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1040070.0, ans=0.0 2024-08-11 10:01:09,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1040070.0, ans=0.0 2024-08-11 10:01:22,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1040170.0, ans=0.0 2024-08-11 10:01:37,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1040270.0, ans=0.125 2024-08-11 10:01:46,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1040370.0, ans=0.0 2024-08-11 10:01:50,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1040370.0, ans=0.2 2024-08-11 10:01:50,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1040370.0, ans=0.0 2024-08-11 10:01:59,623 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2600, loss[loss=0.1009, beats_loss=0.01223, ecapa_loss=0.0003107, whisper_loss=0.08552, over 20802.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01127, ecapa_loss=0.0001991, whisper_loss=0.09411, over 3863539.98 frames. ], batch size: 93, lr: 8.07e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:02:12,484 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-11 10:02:21,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1040570.0, ans=0.0 2024-08-11 10:02:37,588 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=15.0 2024-08-11 10:02:39,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1040770.0, ans=0.125 2024-08-11 10:02:45,633 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 17 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 10:02:46,213 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.11 vs. limit=10.0 2024-08-11 10:02:47,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1040770.0, ans=0.125 2024-08-11 10:03:01,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1040870.0, ans=0.125 2024-08-11 10:03:05,892 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2650, loss[loss=0.1203, beats_loss=0.009352, ecapa_loss=0.0002369, whisper_loss=0.1086, over 19026.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01116, ecapa_loss=0.0002034, whisper_loss=0.09423, over 3891331.57 frames. ], batch size: 77, lr: 8.07e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:03:09,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.707e+01 2.925e+01 3.318e+01 6.568e+01, threshold=5.849e+01, percent-clipped=0.0 2024-08-11 10:03:10,368 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.84 vs. limit=15.0 2024-08-11 10:03:25,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1041070.0, ans=0.125 2024-08-11 10:03:47,725 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 10:03:49,167 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-11 10:04:00,293 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 10:04:11,360 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2700, loss[loss=0.1007, beats_loss=0.01243, ecapa_loss=0.0002273, whisper_loss=0.08596, over 20040.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01118, ecapa_loss=0.0002025, whisper_loss=0.09381, over 3896188.45 frames. ], batch size: 88, lr: 8.07e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:04:27,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1041570.0, ans=0.125 2024-08-11 10:04:33,952 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 10:04:36,421 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.59 vs. limit=22.5 2024-08-11 10:04:44,893 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 10:04:46,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1041670.0, ans=0.125 2024-08-11 10:04:54,053 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 10:05:02,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1041770.0, ans=0.0 2024-08-11 10:05:05,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1041870.0, ans=0.1 2024-08-11 10:05:06,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1041870.0, ans=0.035 2024-08-11 10:05:09,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1041870.0, ans=0.1 2024-08-11 10:05:14,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1041870.0, ans=0.2 2024-08-11 10:05:18,109 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2750, loss[loss=0.09845, beats_loss=0.009168, ecapa_loss=0.0002047, whisper_loss=0.08723, over 18585.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0112, ecapa_loss=0.0002013, whisper_loss=0.09336, over 3881397.53 frames. ], batch size: 73, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:05:22,009 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.665e+01 2.980e+01 3.281e+01 5.234e+01, threshold=5.959e+01, percent-clipped=0.0 2024-08-11 10:05:27,540 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 10:05:29,446 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.66 vs. limit=15.0 2024-08-11 10:06:07,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1042270.0, ans=0.125 2024-08-11 10:06:24,113 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2800, loss[loss=0.1129, beats_loss=0.01229, ecapa_loss=0.0001975, whisper_loss=0.09861, over 22324.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01128, ecapa_loss=0.0002007, whisper_loss=0.09316, over 3865412.86 frames. ], batch size: 90, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:06:46,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1042570.0, ans=0.125 2024-08-11 10:06:47,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1042570.0, ans=0.125 2024-08-11 10:06:54,316 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 10:06:56,945 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 10:06:58,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1042670.0, ans=0.2 2024-08-11 10:07:01,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1042670.0, ans=0.2 2024-08-11 10:07:06,521 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 10:07:10,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1042770.0, ans=0.125 2024-08-11 10:07:20,569 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=22.5 2024-08-11 10:07:21,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1042870.0, ans=0.125 2024-08-11 10:07:29,616 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2850, loss[loss=0.1012, beats_loss=0.011, ecapa_loss=0.000179, whisper_loss=0.08842, over 15540.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0113, ecapa_loss=0.0002009, whisper_loss=0.09323, over 3852077.51 frames. ], batch size: 62, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:07:33,516 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.749e+01 2.990e+01 3.438e+01 5.063e+01, threshold=5.981e+01, percent-clipped=0.0 2024-08-11 10:07:33,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1042970.0, ans=0.2 2024-08-11 10:07:38,117 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2024-08-11 10:07:57,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1043170.0, ans=0.125 2024-08-11 10:07:57,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1043170.0, ans=0.2 2024-08-11 10:08:14,100 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 10:08:15,713 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 10:08:18,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1043270.0, ans=0.125 2024-08-11 10:08:24,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1043370.0, ans=0.0 2024-08-11 10:08:30,645 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 10:08:35,685 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2900, loss[loss=0.106, beats_loss=0.01269, ecapa_loss=0.0001655, whisper_loss=0.09164, over 21328.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0114, ecapa_loss=0.0001999, whisper_loss=0.09284, over 3837402.47 frames. ], batch size: 86, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:09:14,363 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 10:09:42,037 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 2950, loss[loss=0.09168, beats_loss=0.01215, ecapa_loss=0.0002111, whisper_loss=0.07743, over 18972.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01135, ecapa_loss=0.0002015, whisper_loss=0.09304, over 3852185.05 frames. ], batch size: 78, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:09:45,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.608e+01 2.908e+01 3.326e+01 5.190e+01, threshold=5.815e+01, percent-clipped=0.0 2024-08-11 10:09:55,153 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 10:10:02,277 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 10:10:08,674 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 10:10:24,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1044270.0, ans=0.0 2024-08-11 10:10:40,815 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2024-08-11 10:10:47,949 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3000, loss[loss=0.09878, beats_loss=0.01432, ecapa_loss=0.0001604, whisper_loss=0.08286, over 22503.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01138, ecapa_loss=0.0002005, whisper_loss=0.09304, over 3890813.42 frames. ], batch size: 90, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:10:47,950 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 10:11:27,187 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on ASR_libri: loss=0.2573, beats_loss=0, ecapa_loss=0.0006456, whisper_loss=0.2509, over 922467.00 frames. 2024-08-11 10:11:45,302 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on SV_voxceleb1: loss=0.005368, beats_loss=0, ecapa_loss=0.0005368, whisper_loss=0, over 939242.00 frames. 2024-08-11 10:13:42,421 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on AT_audioset: loss=0.02512, beats_loss=0.02512, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 10:13:42,425 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 10:13:57,474 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 10:14:01,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1044570.0, ans=0.125 2024-08-11 10:14:02,886 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 31 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 10:14:11,387 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2024-08-11 10:14:37,731 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 10:14:49,686 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2024-08-11 10:14:49,863 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3050, loss[loss=0.105, beats_loss=0.01122, ecapa_loss=0.0002428, whisper_loss=0.09136, over 21051.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01133, ecapa_loss=0.0002015, whisper_loss=0.0935, over 3891337.27 frames. ], batch size: 90, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:14:51,488 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 29 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-11 10:14:53,656 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.751e+01 3.093e+01 3.441e+01 4.563e+01, threshold=6.185e+01, percent-clipped=0.0 2024-08-11 10:14:59,109 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 10:15:00,594 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 10:15:05,824 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 10:15:11,318 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 10:15:15,658 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2024-08-11 10:15:17,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1045170.0, ans=0.125 2024-08-11 10:15:23,105 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 10:15:30,286 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 10:15:30,724 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2024-08-11 10:15:34,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1045270.0, ans=0.125 2024-08-11 10:15:42,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1045370.0, ans=0.125 2024-08-11 10:15:52,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1045370.0, ans=15.0 2024-08-11 10:15:56,545 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3100, loss[loss=0.1183, beats_loss=0.01006, ecapa_loss=0.0002321, whisper_loss=0.106, over 21605.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01129, ecapa_loss=0.0002024, whisper_loss=0.09453, over 3871780.22 frames. ], batch size: 88, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:16:10,092 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 10:16:19,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1045570.0, ans=0.0 2024-08-11 10:16:21,786 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 10:16:27,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1045670.0, ans=0.125 2024-08-11 10:16:28,493 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-11 10:16:36,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1045770.0, ans=0.125 2024-08-11 10:16:40,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1045770.0, ans=0.125 2024-08-11 10:16:41,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1045770.0, ans=0.0 2024-08-11 10:16:43,065 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-11 10:16:48,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1045870.0, ans=0.2 2024-08-11 10:16:49,881 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 10:17:00,459 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0893079861998558, model_norm_threshold=61.852699279785156 2024-08-11 10:17:00,636 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.98, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.682e+05, grad_sumsq=5.221e+04, orig_rms_sq=8.968e+00 2024-08-11 10:17:03,152 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3150, loss[loss=0.1059, beats_loss=0.0121, ecapa_loss=0.0002079, whisper_loss=0.09177, over 16747.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01122, ecapa_loss=0.0002027, whisper_loss=0.09509, over 3841544.20 frames. ], batch size: 67, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:17:07,431 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.851e+01 3.278e+01 3.632e+01 6.926e+02, threshold=6.555e+01, percent-clipped=1.0 2024-08-11 10:17:08,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1045970.0, ans=0.125 2024-08-11 10:17:25,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1046070.0, ans=0.05 2024-08-11 10:17:38,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1046170.0, ans=0.125 2024-08-11 10:17:58,887 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=12.0 2024-08-11 10:18:09,711 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3200, loss[loss=0.1122, beats_loss=0.007379, ecapa_loss=0.0002077, whisper_loss=0.1028, over 17991.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01118, ecapa_loss=0.0002026, whisper_loss=0.09543, over 3844264.53 frames. ], batch size: 70, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:18:14,585 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.388e+03 2024-08-11 10:18:16,710 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 25 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-11 10:18:18,078 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 10:18:21,904 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 10:18:29,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1046570.0, ans=0.1 2024-08-11 10:18:43,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1046670.0, ans=0.125 2024-08-11 10:18:45,833 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-11 10:19:16,387 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3250, loss[loss=0.1072, beats_loss=0.0131, ecapa_loss=0.0002089, whisper_loss=0.09205, over 22579.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01111, ecapa_loss=0.000203, whisper_loss=0.09623, over 3860963.26 frames. ], batch size: 90, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:19:20,673 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.129e+01 2.734e+01 3.207e+01 3.832e+01 6.451e+01, threshold=6.414e+01, percent-clipped=0.0 2024-08-11 10:19:38,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1047070.0, ans=0.125 2024-08-11 10:19:45,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=1047170.0, ans=0.1 2024-08-11 10:19:59,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1047270.0, ans=0.125 2024-08-11 10:20:18,276 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2024-08-11 10:20:20,331 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 10:20:22,943 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3300, loss[loss=0.09732, beats_loss=0.01015, ecapa_loss=0.0002769, whisper_loss=0.0844, over 16018.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01117, ecapa_loss=0.0002039, whisper_loss=0.09569, over 3898878.15 frames. ], batch size: 71, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:20:27,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1047470.0, ans=0.125 2024-08-11 10:20:39,936 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2024-08-11 10:20:49,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1047670.0, ans=0.1 2024-08-11 10:20:59,855 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 10:21:02,472 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 10:21:09,429 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 12 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 10:21:16,243 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 10:21:21,540 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-08-11 10:21:22,445 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 10:21:30,261 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3350, loss[loss=0.1132, beats_loss=0.009, ecapa_loss=0.00023, whisper_loss=0.1019, over 21710.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.0112, ecapa_loss=0.0002028, whisper_loss=0.09556, over 3890437.71 frames. ], batch size: 91, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:21:30,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1047970.0, ans=0.0 2024-08-11 10:21:34,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.287e+01 2.767e+01 3.123e+01 3.740e+01 5.333e+01, threshold=6.246e+01, percent-clipped=0.0 2024-08-11 10:21:36,852 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2024-08-11 10:21:45,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1048070.0, ans=0.125 2024-08-11 10:21:47,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1048070.0, ans=0.0 2024-08-11 10:22:00,012 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 10:22:13,431 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-11 10:22:16,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1048270.0, ans=0.2 2024-08-11 10:22:18,822 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 10:22:26,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1048370.0, ans=0.125 2024-08-11 10:22:30,204 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=8.0 2024-08-11 10:22:36,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1048470.0, ans=0.1 2024-08-11 10:22:36,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1048470.0, ans=0.1 2024-08-11 10:22:37,334 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3400, loss[loss=0.08941, beats_loss=0.01031, ecapa_loss=0.0001919, whisper_loss=0.07719, over 15299.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01123, ecapa_loss=0.0002015, whisper_loss=0.09484, over 3853301.60 frames. ], batch size: 60, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:22:47,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1048470.0, ans=0.0 2024-08-11 10:22:51,684 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2024-08-11 10:23:18,911 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 30 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 10:23:28,703 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-11 10:23:29,552 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 10:23:34,580 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2024-08-11 10:23:45,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1048970.0, ans=0.0 2024-08-11 10:23:46,473 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3450, loss[loss=0.08709, beats_loss=0.009435, ecapa_loss=0.000203, whisper_loss=0.07563, over 14511.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01129, ecapa_loss=0.0002025, whisper_loss=0.09387, over 3873544.44 frames. ], batch size: 56, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:23:50,563 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.572e+01 2.937e+01 3.389e+01 1.105e+02, threshold=5.874e+01, percent-clipped=1.0 2024-08-11 10:23:52,660 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.36 vs. limit=12.0 2024-08-11 10:24:40,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1049370.0, ans=0.125 2024-08-11 10:24:54,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1049470.0, ans=0.05 2024-08-11 10:24:55,035 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3500, loss[loss=0.111, beats_loss=0.008093, ecapa_loss=0.0002344, whisper_loss=0.1006, over 17176.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01133, ecapa_loss=0.0002037, whisper_loss=0.0928, over 3862260.70 frames. ], batch size: 69, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:25:15,631 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 10:25:33,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1049670.0, ans=0.125 2024-08-11 10:25:40,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1049770.0, ans=0.125 2024-08-11 10:25:48,461 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 10:25:50,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1049870.0, ans=0.0 2024-08-11 10:25:51,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1049870.0, ans=0.125 2024-08-11 10:25:59,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1049870.0, ans=0.125 2024-08-11 10:26:02,897 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3550, loss[loss=0.106, beats_loss=0.01074, ecapa_loss=0.0002014, whisper_loss=0.09321, over 22139.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01137, ecapa_loss=0.0002028, whisper_loss=0.09187, over 3829014.05 frames. ], batch size: 89, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:26:04,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1049970.0, ans=0.1 2024-08-11 10:26:07,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.679e+01 2.987e+01 3.672e+01 5.992e+01, threshold=5.975e+01, percent-clipped=1.0 2024-08-11 10:26:11,495 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 10:26:15,508 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 10:26:15,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1050070.0, ans=0.125 2024-08-11 10:26:24,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1050070.0, ans=0.125 2024-08-11 10:26:26,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1050070.0, ans=0.0 2024-08-11 10:26:30,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1050170.0, ans=0.2 2024-08-11 10:26:33,747 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 10:26:35,086 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 31 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 10:26:39,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1050170.0, ans=0.07 2024-08-11 10:26:49,386 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=8.087e-03 2024-08-11 10:26:56,484 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=12.0 2024-08-11 10:27:12,662 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3600, loss[loss=0.07989, beats_loss=0.01342, ecapa_loss=0.0001864, whisper_loss=0.06461, over 13416.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01135, ecapa_loss=0.0002046, whisper_loss=0.09256, over 3853701.52 frames. ], batch size: 54, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:27:15,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1050470.0, ans=0.5 2024-08-11 10:27:24,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1050570.0, ans=0.2 2024-08-11 10:27:38,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2024-08-11 10:27:54,884 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 10:27:57,372 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 10:28:00,149 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 10:28:01,423 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 10:28:04,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1050770.0, ans=0.0 2024-08-11 10:28:09,851 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 10:28:22,173 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3650, loss[loss=0.09895, beats_loss=0.01207, ecapa_loss=0.0001516, whisper_loss=0.08537, over 22184.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01135, ecapa_loss=0.0002037, whisper_loss=0.09246, over 3829785.02 frames. ], batch size: 87, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:28:26,697 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.681e+01 3.041e+01 3.404e+01 5.123e+01, threshold=6.083e+01, percent-clipped=0.0 2024-08-11 10:28:30,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1050970.0, ans=10.0 2024-08-11 10:28:31,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1050970.0, ans=0.0 2024-08-11 10:28:48,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1051070.0, ans=0.0 2024-08-11 10:28:52,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1051170.0, ans=0.1 2024-08-11 10:28:57,016 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-11 10:28:57,809 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=12.0 2024-08-11 10:29:05,436 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-11 10:29:14,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1051270.0, ans=0.125 2024-08-11 10:29:15,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1051270.0, ans=0.125 2024-08-11 10:29:16,660 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 10:29:28,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1051370.0, ans=0.0 2024-08-11 10:29:32,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1051470.0, ans=0.0 2024-08-11 10:29:33,266 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3700, loss[loss=0.1089, beats_loss=0.01058, ecapa_loss=0.0002237, whisper_loss=0.09605, over 22972.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01131, ecapa_loss=0.0002038, whisper_loss=0.09249, over 3827834.73 frames. ], batch size: 94, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:29:41,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1051470.0, ans=0.125 2024-08-11 10:29:48,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1051570.0, ans=0.1 2024-08-11 10:29:51,283 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-11 10:30:15,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1051770.0, ans=0.125 2024-08-11 10:30:18,614 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 10:30:45,285 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3750, loss[loss=0.1185, beats_loss=0.009949, ecapa_loss=0.0002043, whisper_loss=0.1065, over 22180.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01136, ecapa_loss=0.0002029, whisper_loss=0.09261, over 3830176.94 frames. ], batch size: 90, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:30:49,514 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.786e+01 3.057e+01 3.501e+01 5.299e+01, threshold=6.113e+01, percent-clipped=0.0 2024-08-11 10:30:54,049 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 22 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 10:30:57,393 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.84 vs. limit=15.0 2024-08-11 10:31:11,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1052070.0, ans=0.1 2024-08-11 10:31:13,090 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-11 10:31:25,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-11 10:31:55,533 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3800, loss[loss=0.08902, beats_loss=0.008728, ecapa_loss=0.0002334, whisper_loss=0.07796, over 14249.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01137, ecapa_loss=0.0002031, whisper_loss=0.09267, over 3829692.26 frames. ], batch size: 55, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:32:06,779 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 10:32:08,189 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-11 10:32:09,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1052570.0, ans=0.125 2024-08-11 10:32:12,156 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 10:32:16,509 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 10:32:29,248 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 10:32:41,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1052770.0, ans=0.05 2024-08-11 10:32:51,147 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 10:33:06,615 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3850, loss[loss=0.09919, beats_loss=0.01211, ecapa_loss=0.0001879, whisper_loss=0.08519, over 22689.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01139, ecapa_loss=0.0002033, whisper_loss=0.09232, over 3840540.71 frames. ], batch size: 93, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:33:09,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1052970.0, ans=0.0 2024-08-11 10:33:10,676 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+01 2.764e+01 3.232e+01 3.837e+01 5.936e+01, threshold=6.465e+01, percent-clipped=0.0 2024-08-11 10:33:10,984 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 21 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-11 10:33:13,667 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 10:33:13,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1052970.0, ans=0.125 2024-08-11 10:33:22,465 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 10:33:25,092 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 10:33:32,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1053070.0, ans=0.125 2024-08-11 10:33:43,365 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 10:33:52,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1053270.0, ans=0.125 2024-08-11 10:33:58,336 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-08-11 10:34:00,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1053370.0, ans=0.125 2024-08-11 10:34:16,330 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3900, loss[loss=0.1139, beats_loss=0.01068, ecapa_loss=0.000205, whisper_loss=0.1011, over 19788.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01138, ecapa_loss=0.0002046, whisper_loss=0.09216, over 3859098.98 frames. ], batch size: 78, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:34:19,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1053470.0, ans=0.0 2024-08-11 10:34:39,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1053570.0, ans=0.125 2024-08-11 10:34:44,244 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 30 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 10:34:47,659 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 10:34:52,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1053670.0, ans=0.5 2024-08-11 10:35:16,405 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=15.0 2024-08-11 10:35:17,101 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 34 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-11 10:35:20,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1053870.0, ans=0.2 2024-08-11 10:35:23,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1053870.0, ans=0.2 2024-08-11 10:35:25,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1053870.0, ans=0.125 2024-08-11 10:35:27,685 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 3950, loss[loss=0.1116, beats_loss=0.01237, ecapa_loss=0.0001724, whisper_loss=0.09747, over 19256.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01134, ecapa_loss=0.0002039, whisper_loss=0.09313, over 3853904.80 frames. ], batch size: 75, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:35:32,094 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.817e+01 3.170e+01 3.825e+01 1.516e+02, threshold=6.340e+01, percent-clipped=2.0 2024-08-11 10:36:01,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1054170.0, ans=0.125 2024-08-11 10:36:07,146 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 10:36:42,820 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4000, loss[loss=0.1396, beats_loss=0.009735, ecapa_loss=0.0002543, whisper_loss=0.1274, over 22918.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01127, ecapa_loss=0.0002039, whisper_loss=0.09379, over 3859294.60 frames. ], batch size: 93, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:37:04,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1054570.0, ans=0.2 2024-08-11 10:37:11,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1054670.0, ans=0.2 2024-08-11 10:37:19,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1054670.0, ans=0.125 2024-08-11 10:37:32,874 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 10:37:34,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1054770.0, ans=0.2 2024-08-11 10:37:36,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1054770.0, ans=0.125 2024-08-11 10:37:51,194 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 24 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-11 10:37:54,823 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 22 from Vox, 16 fro AS 2024-08-11 10:37:58,485 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4050, loss[loss=0.0865, beats_loss=0.01221, ecapa_loss=0.0001884, whisper_loss=0.07241, over 15063.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01122, ecapa_loss=0.0002028, whisper_loss=0.09443, over 3877205.83 frames. ], batch size: 61, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:38:03,813 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.646e+01 2.921e+01 3.336e+01 5.282e+01, threshold=5.841e+01, percent-clipped=0.0 2024-08-11 10:38:30,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1055170.0, ans=0.125 2024-08-11 10:38:37,398 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 10:38:37,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1055170.0, ans=0.125 2024-08-11 10:38:41,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1055170.0, ans=0.0 2024-08-11 10:38:46,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1055270.0, ans=0.0 2024-08-11 10:38:47,827 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-11 10:38:50,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1055270.0, ans=0.1 2024-08-11 10:39:02,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1055370.0, ans=0.125 2024-08-11 10:39:07,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1055370.0, ans=0.125 2024-08-11 10:39:12,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1055370.0, ans=0.2 2024-08-11 10:39:15,082 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4100, loss[loss=0.1137, beats_loss=0.01161, ecapa_loss=0.000211, whisper_loss=0.09999, over 18346.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01127, ecapa_loss=0.0002031, whisper_loss=0.09428, over 3871330.40 frames. ], batch size: 73, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:39:20,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1055470.0, ans=0.1 2024-08-11 10:39:27,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1055470.0, ans=0.1 2024-08-11 10:39:37,680 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.44 vs. limit=22.5 2024-08-11 10:39:38,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1055570.0, ans=0.125 2024-08-11 10:40:09,919 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 10:40:19,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1055870.0, ans=0.0 2024-08-11 10:40:21,296 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-08-11 10:40:28,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.35 vs. limit=15.0 2024-08-11 10:40:31,514 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.00 vs. limit=22.5 2024-08-11 10:40:34,085 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4150, loss[loss=0.09787, beats_loss=0.01349, ecapa_loss=0.0001976, whisper_loss=0.08241, over 22080.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01141, ecapa_loss=0.0002041, whisper_loss=0.09332, over 3873620.44 frames. ], batch size: 91, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:40:36,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1055970.0, ans=0.0 2024-08-11 10:40:38,424 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.691e+01 3.023e+01 3.383e+01 1.135e+02, threshold=6.046e+01, percent-clipped=2.0 2024-08-11 10:40:38,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1055970.0, ans=0.1 2024-08-11 10:40:46,286 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 20 from LS+wenet, 7 from Vox, 26 fro AS 2024-08-11 10:40:52,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1056070.0, ans=0.125 2024-08-11 10:40:52,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1056070.0, ans=0.125 2024-08-11 10:41:00,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1056070.0, ans=0.0 2024-08-11 10:41:04,389 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 10:41:10,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2024-08-11 10:41:21,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1056270.0, ans=0.125 2024-08-11 10:41:30,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1056270.0, ans=0.0 2024-08-11 10:41:32,219 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 10:41:34,882 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 12 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 10:41:48,019 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4200, loss[loss=0.1074, beats_loss=0.01076, ecapa_loss=0.0002421, whisper_loss=0.09419, over 20285.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01131, ecapa_loss=0.0002041, whisper_loss=0.09344, over 3862046.66 frames. ], batch size: 86, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:42:01,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1056570.0, ans=0.125 2024-08-11 10:42:05,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1056570.0, ans=0.0 2024-08-11 10:42:06,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1056570.0, ans=0.125 2024-08-11 10:42:18,125 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 10:42:20,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1056670.0, ans=0.0 2024-08-11 10:42:23,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1056670.0, ans=0.1 2024-08-11 10:42:44,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1056770.0, ans=0.0 2024-08-11 10:42:53,222 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.832e-02 2024-08-11 10:42:53,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.51 vs. limit=22.5 2024-08-11 10:43:02,848 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4250, loss[loss=0.1157, beats_loss=0.009748, ecapa_loss=0.0002488, whisper_loss=0.1035, over 16917.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01139, ecapa_loss=0.0002027, whisper_loss=0.09351, over 3869932.35 frames. ], batch size: 70, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:43:03,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1056970.0, ans=0.125 2024-08-11 10:43:07,328 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.666e+01 2.925e+01 3.281e+01 5.407e+01, threshold=5.850e+01, percent-clipped=0.0 2024-08-11 10:43:14,098 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.43 vs. limit=22.5 2024-08-11 10:43:23,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1057070.0, ans=0.0 2024-08-11 10:43:33,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1057170.0, ans=0.125 2024-08-11 10:43:44,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1057270.0, ans=0.125 2024-08-11 10:43:56,027 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2024-08-11 10:43:56,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.62 vs. limit=5.0 2024-08-11 10:44:16,388 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4300, loss[loss=0.1138, beats_loss=0.01191, ecapa_loss=0.0002003, whisper_loss=0.09992, over 20142.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01138, ecapa_loss=0.0002034, whisper_loss=0.09295, over 3855862.37 frames. ], batch size: 80, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:44:34,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1057570.0, ans=0.1 2024-08-11 10:44:35,216 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 26 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 10:44:44,586 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=15.0 2024-08-11 10:44:49,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1057670.0, ans=0.0 2024-08-11 10:45:10,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1057770.0, ans=0.125 2024-08-11 10:45:13,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1057770.0, ans=0.125 2024-08-11 10:45:20,811 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 10:45:34,052 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4350, loss[loss=0.1117, beats_loss=0.009273, ecapa_loss=0.000236, whisper_loss=0.1001, over 17843.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01134, ecapa_loss=0.0002016, whisper_loss=0.09336, over 3850247.15 frames. ], batch size: 73, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:45:37,134 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 10:45:38,580 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.592e+01 2.860e+01 3.306e+01 4.790e+01, threshold=5.719e+01, percent-clipped=0.0 2024-08-11 10:46:02,554 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-11 10:46:07,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1058170.0, ans=0.2 2024-08-11 10:46:10,517 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 42 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 10:46:18,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1058270.0, ans=0.0 2024-08-11 10:46:28,440 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2024-08-11 10:46:30,733 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 10:46:51,162 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4400, loss[loss=0.1195, beats_loss=0.01056, ecapa_loss=0.0001968, whisper_loss=0.1069, over 15048.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01125, ecapa_loss=0.0002013, whisper_loss=0.09405, over 3860376.02 frames. ], batch size: 57, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:47:00,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1058470.0, ans=0.0 2024-08-11 10:47:01,390 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 10:47:27,503 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 10:47:29,682 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 10:47:40,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1058770.0, ans=0.125 2024-08-11 10:47:40,409 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=9.534e-01 2024-08-11 10:47:40,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1058770.0, ans=0.0 2024-08-11 10:47:45,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2024-08-11 10:47:45,208 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.03 vs. limit=22.5 2024-08-11 10:48:08,483 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.516e-01 2024-08-11 10:48:08,928 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=22.5 2024-08-11 10:48:10,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1058870.0, ans=0.0 2024-08-11 10:48:13,479 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4450, loss[loss=0.1142, beats_loss=0.01235, ecapa_loss=0.0001959, whisper_loss=0.09994, over 22527.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01142, ecapa_loss=0.0002001, whisper_loss=0.09267, over 3876140.59 frames. ], batch size: 89, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:48:17,636 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.805e+01 3.007e+01 3.333e+01 6.979e+01, threshold=6.014e+01, percent-clipped=1.0 2024-08-11 10:48:39,878 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 10:49:04,134 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=12.0 2024-08-11 10:49:11,789 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.26 vs. limit=10.0 2024-08-11 10:49:15,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1059370.0, ans=0.125 2024-08-11 10:49:21,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1059370.0, ans=0.0 2024-08-11 10:49:24,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1059370.0, ans=0.1 2024-08-11 10:49:28,606 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.96 vs. limit=22.5 2024-08-11 10:49:29,635 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4500, loss[loss=0.1216, beats_loss=0.0108, ecapa_loss=0.0002159, whisper_loss=0.1086, over 18584.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01133, ecapa_loss=0.0002006, whisper_loss=0.09337, over 3848605.04 frames. ], batch size: 73, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:49:33,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1059470.0, ans=0.1 2024-08-11 10:49:36,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1059470.0, ans=0.125 2024-08-11 10:49:37,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1059470.0, ans=0.125 2024-08-11 10:49:44,740 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-11 10:50:16,532 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.83 vs. limit=22.5 2024-08-11 10:50:19,662 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=15.0 2024-08-11 10:50:28,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1059870.0, ans=0.2 2024-08-11 10:50:34,098 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 10:50:44,120 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4550, loss[loss=0.1086, beats_loss=0.01025, ecapa_loss=0.0002301, whisper_loss=0.09605, over 14368.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01132, ecapa_loss=0.0002023, whisper_loss=0.09387, over 3869934.28 frames. ], batch size: 58, lr: 7.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:50:48,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.557e+01 2.865e+01 3.375e+01 6.211e+01, threshold=5.730e+01, percent-clipped=1.0 2024-08-11 10:51:53,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1060370.0, ans=0.125 2024-08-11 10:51:57,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1060370.0, ans=0.125 2024-08-11 10:51:59,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1060470.0, ans=0.0 2024-08-11 10:52:00,229 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4600, loss[loss=0.1188, beats_loss=0.01023, ecapa_loss=0.000226, whisper_loss=0.1063, over 22100.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01135, ecapa_loss=0.0002027, whisper_loss=0.09355, over 3860282.79 frames. ], batch size: 91, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:52:15,943 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 10:52:23,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1060570.0, ans=0.125 2024-08-11 10:52:28,202 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-11 10:52:31,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1060670.0, ans=0.125 2024-08-11 10:52:31,618 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.199e+00 2024-08-11 10:53:01,223 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 11 from Vox, 41 fro AS 2024-08-11 10:53:12,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1060870.0, ans=0.0 2024-08-11 10:53:15,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1060870.0, ans=0.0 2024-08-11 10:53:15,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1060870.0, ans=0.125 2024-08-11 10:53:16,266 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2024-08-11 10:53:21,149 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4650, loss[loss=0.09663, beats_loss=0.0104, ecapa_loss=0.0002195, whisper_loss=0.08403, over 15802.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01138, ecapa_loss=0.0002021, whisper_loss=0.0933, over 3851347.78 frames. ], batch size: 62, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:53:26,047 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.723e+01 3.113e+01 3.495e+01 7.663e+01, threshold=6.226e+01, percent-clipped=1.0 2024-08-11 10:53:26,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1060970.0, ans=0.125 2024-08-11 10:53:33,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1060970.0, ans=0.125 2024-08-11 10:53:33,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1060970.0, ans=0.2 2024-08-11 10:53:40,181 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.94 vs. limit=5.0 2024-08-11 10:53:42,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1061070.0, ans=0.1 2024-08-11 10:53:46,307 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2024-08-11 10:54:00,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1061170.0, ans=0.0 2024-08-11 10:54:06,748 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2024-08-11 10:54:20,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2024-08-11 10:54:22,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1061270.0, ans=0.04949747468305833 2024-08-11 10:54:27,082 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 10:54:27,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2024-08-11 10:54:42,651 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4700, loss[loss=0.1136, beats_loss=0.01182, ecapa_loss=0.0001512, whisper_loss=0.1003, over 19620.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01144, ecapa_loss=0.0002014, whisper_loss=0.09338, over 3859919.65 frames. ], batch size: 73, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:54:48,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1061470.0, ans=0.0 2024-08-11 10:54:51,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1061470.0, ans=0.0 2024-08-11 10:54:52,874 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 10:55:01,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1061570.0, ans=0.125 2024-08-11 10:55:24,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1061670.0, ans=0.125 2024-08-11 10:55:49,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1061870.0, ans=0.0 2024-08-11 10:55:56,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1061870.0, ans=0.1 2024-08-11 10:56:05,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4750, loss[loss=0.07747, beats_loss=0.01186, ecapa_loss=0.0001787, whisper_loss=0.06383, over 14469.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01143, ecapa_loss=0.0002018, whisper_loss=0.09328, over 3847986.51 frames. ], batch size: 57, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:56:06,592 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2024-08-11 10:56:10,308 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+01 2.759e+01 3.104e+01 3.569e+01 5.241e+01, threshold=6.207e+01, percent-clipped=0.0 2024-08-11 10:56:16,305 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 10:56:16,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1061970.0, ans=15.0 2024-08-11 10:56:28,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1062070.0, ans=0.0 2024-08-11 10:56:31,869 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 11 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 10:56:36,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1062070.0, ans=0.1 2024-08-11 10:56:36,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1062070.0, ans=0.125 2024-08-11 10:56:44,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1062170.0, ans=0.125 2024-08-11 10:56:48,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1062170.0, ans=0.0 2024-08-11 10:56:57,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1062270.0, ans=0.1 2024-08-11 10:57:17,675 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2024-08-11 10:57:20,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1062370.0, ans=0.09899494936611666 2024-08-11 10:57:25,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1062370.0, ans=0.07 2024-08-11 10:57:31,636 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4800, loss[loss=0.109, beats_loss=0.01236, ecapa_loss=0.0001853, whisper_loss=0.09475, over 20080.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01139, ecapa_loss=0.0002048, whisper_loss=0.09293, over 3826255.14 frames. ], batch size: 78, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:57:40,204 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.52 vs. limit=22.5 2024-08-11 10:57:52,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1062570.0, ans=0.0 2024-08-11 10:57:56,738 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-11 10:58:05,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1062670.0, ans=0.125 2024-08-11 10:58:08,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1062670.0, ans=0.125 2024-08-11 10:58:39,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1062870.0, ans=0.0 2024-08-11 10:58:51,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1062870.0, ans=0.125 2024-08-11 10:58:51,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1062870.0, ans=0.125 2024-08-11 10:58:52,818 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 10:58:55,807 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4850, loss[loss=0.09136, beats_loss=0.01281, ecapa_loss=0.0002066, whisper_loss=0.07648, over 20432.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01145, ecapa_loss=0.0002037, whisper_loss=0.09217, over 3831524.70 frames. ], batch size: 88, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:59:00,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.634e+01 3.190e+01 3.671e+01 5.547e+01, threshold=6.379e+01, percent-clipped=0.0 2024-08-11 10:59:10,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1063070.0, ans=0.125 2024-08-11 10:59:23,385 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.085e-01 2024-08-11 10:59:45,933 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 10:59:58,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1063370.0, ans=0.125 2024-08-11 11:00:15,128 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4900, loss[loss=0.1035, beats_loss=0.01291, ecapa_loss=0.0001755, whisper_loss=0.0888, over 22840.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01151, ecapa_loss=0.0002026, whisper_loss=0.09229, over 3880785.91 frames. ], batch size: 92, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:00:17,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1063470.0, ans=0.2 2024-08-11 11:00:20,824 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 11:00:38,331 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 11:00:41,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1063570.0, ans=0.0 2024-08-11 11:00:51,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1063670.0, ans=0.125 2024-08-11 11:01:04,878 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2024-08-11 11:01:26,061 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 11:01:26,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1063870.0, ans=0.0 2024-08-11 11:01:27,608 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 13 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 11:01:31,299 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=12.0 2024-08-11 11:01:33,945 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 11:01:37,393 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 4950, loss[loss=0.08617, beats_loss=0.0147, ecapa_loss=0.0001893, whisper_loss=0.06959, over 22283.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01157, ecapa_loss=0.0002015, whisper_loss=0.09221, over 3900148.78 frames. ], batch size: 90, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:01:43,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.682e+01 3.010e+01 3.354e+01 5.437e+01, threshold=6.020e+01, percent-clipped=0.0 2024-08-11 11:02:01,204 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2024-08-11 11:02:14,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1064170.0, ans=0.0 2024-08-11 11:02:26,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1064170.0, ans=0.125 2024-08-11 11:02:52,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1064370.0, ans=0.0 2024-08-11 11:03:00,543 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5000, loss[loss=0.1022, beats_loss=0.01126, ecapa_loss=0.0002519, whisper_loss=0.08842, over 16954.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01154, ecapa_loss=0.0002016, whisper_loss=0.09293, over 3907220.97 frames. ], batch size: 71, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:03:02,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1064470.0, ans=0.125 2024-08-11 11:03:08,893 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=12.0 2024-08-11 11:03:11,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1064470.0, ans=0.2 2024-08-11 11:03:18,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1064570.0, ans=0.0 2024-08-11 11:03:37,950 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 11:03:38,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1064670.0, ans=0.1 2024-08-11 11:03:38,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1064670.0, ans=0.1 2024-08-11 11:03:59,721 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 11:04:06,760 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 11:04:12,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1064870.0, ans=0.125 2024-08-11 11:04:18,760 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.30 vs. limit=15.0 2024-08-11 11:04:24,849 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5050, loss[loss=0.1379, beats_loss=0.007625, ecapa_loss=0.0002018, whisper_loss=0.1282, over 17727.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01157, ecapa_loss=0.000201, whisper_loss=0.09303, over 3913432.22 frames. ], batch size: 66, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:04:30,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.593e+01 2.899e+01 3.463e+01 4.526e+01, threshold=5.797e+01, percent-clipped=0.0 2024-08-11 11:04:46,682 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-11 11:05:01,185 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.82 vs. limit=10.0 2024-08-11 11:05:05,713 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 11:05:18,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1065270.0, ans=0.125 2024-08-11 11:05:23,951 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 11:05:36,885 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 11 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 11:05:40,359 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 18 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-11 11:05:53,652 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5100, loss[loss=0.1183, beats_loss=0.01206, ecapa_loss=0.0001982, whisper_loss=0.1043, over 22652.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01158, ecapa_loss=0.0002012, whisper_loss=0.09288, over 3896172.86 frames. ], batch size: 89, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:06:08,445 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-11 11:06:10,686 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2024-08-11 11:06:57,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1065770.0, ans=0.0 2024-08-11 11:06:58,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1065770.0, ans=0.125 2024-08-11 11:07:03,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1065870.0, ans=0.0 2024-08-11 11:07:13,874 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 19 from LS+wenet, 25 from Vox, 49 fro AS 2024-08-11 11:07:15,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1065970.0, ans=0.125 2024-08-11 11:07:16,604 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5150, loss[loss=0.1286, beats_loss=0.01129, ecapa_loss=0.0001897, whisper_loss=0.1154, over 22755.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01156, ecapa_loss=0.0001999, whisper_loss=0.09277, over 3910357.33 frames. ], batch size: 89, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:07:22,843 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.743e+01 3.078e+01 3.597e+01 5.105e+01, threshold=6.156e+01, percent-clipped=0.0 2024-08-11 11:07:29,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1065970.0, ans=0.125 2024-08-11 11:07:37,591 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-08-11 11:07:38,248 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 11:07:42,998 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 11:07:51,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1066170.0, ans=0.125 2024-08-11 11:07:52,106 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 11:08:00,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1066170.0, ans=0.2 2024-08-11 11:08:05,995 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 11:08:13,791 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-11 11:08:20,194 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 11:08:33,335 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5200, loss[loss=0.09363, beats_loss=0.01225, ecapa_loss=0.0002152, whisper_loss=0.07923, over 14521.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01147, ecapa_loss=0.0001986, whisper_loss=0.09317, over 3868991.28 frames. ], batch size: 57, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:08:42,207 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-11 11:08:47,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1066570.0, ans=0.0 2024-08-11 11:08:53,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1066570.0, ans=0.1 2024-08-11 11:09:05,372 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 11:09:07,143 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 11:09:20,349 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 11:09:24,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1066770.0, ans=0.0 2024-08-11 11:09:38,975 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 11:09:40,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1066870.0, ans=0.125 2024-08-11 11:09:41,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1066870.0, ans=0.1 2024-08-11 11:09:50,929 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 11:09:52,014 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5250, loss[loss=0.1166, beats_loss=0.01308, ecapa_loss=0.0001884, whisper_loss=0.1017, over 22746.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01142, ecapa_loss=0.0001989, whisper_loss=0.09371, over 3844416.89 frames. ], batch size: 90, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:09:52,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1066970.0, ans=0.125 2024-08-11 11:09:57,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.555e+01 2.975e+01 3.407e+01 4.666e+01, threshold=5.951e+01, percent-clipped=0.0 2024-08-11 11:09:57,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1066970.0, ans=0.125 2024-08-11 11:10:08,152 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 11:10:08,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1067070.0, ans=0.125 2024-08-11 11:10:15,550 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 29 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-11 11:10:19,048 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 11:10:36,791 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 11:10:43,241 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 11:10:48,406 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.03 vs. limit=10.0 2024-08-11 11:10:50,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1067270.0, ans=0.125 2024-08-11 11:10:55,646 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-11 11:11:09,319 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 11:11:10,726 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5300, loss[loss=0.1027, beats_loss=0.0132, ecapa_loss=0.0001628, whisper_loss=0.08792, over 23027.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01142, ecapa_loss=0.0001999, whisper_loss=0.09342, over 3866574.39 frames. ], batch size: 91, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:11:21,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1067470.0, ans=0.0 2024-08-11 11:11:26,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1067570.0, ans=0.1 2024-08-11 11:11:29,100 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.809e-01 2024-08-11 11:11:33,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1067570.0, ans=0.0 2024-08-11 11:11:39,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1067670.0, ans=0.1 2024-08-11 11:11:44,014 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 11:12:29,411 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5350, loss[loss=0.09845, beats_loss=0.01383, ecapa_loss=0.0001576, whisper_loss=0.08305, over 21877.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01146, ecapa_loss=0.0001992, whisper_loss=0.09287, over 3833154.52 frames. ], batch size: 90, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:12:36,279 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.785e+01 3.077e+01 3.493e+01 6.327e+01, threshold=6.155e+01, percent-clipped=1.0 2024-08-11 11:12:47,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1067970.0, ans=0.125 2024-08-11 11:13:15,643 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 11:13:26,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1068170.0, ans=0.125 2024-08-11 11:13:35,054 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=15.0 2024-08-11 11:13:48,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1068270.0, ans=0.125 2024-08-11 11:14:15,130 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5400, loss[loss=0.1071, beats_loss=0.009479, ecapa_loss=0.0001745, whisper_loss=0.09591, over 14231.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01142, ecapa_loss=0.0001989, whisper_loss=0.09328, over 3832641.50 frames. ], batch size: 54, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:14:30,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1068570.0, ans=0.125 2024-08-11 11:14:31,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1068570.0, ans=0.2 2024-08-11 11:14:36,399 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.12 vs. limit=15.0 2024-08-11 11:14:57,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1068670.0, ans=0.125 2024-08-11 11:15:00,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=1068670.0, ans=0.2 2024-08-11 11:15:14,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1068770.0, ans=0.2 2024-08-11 11:15:28,556 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 11:15:50,919 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5450, loss[loss=0.08921, beats_loss=0.01146, ecapa_loss=0.0002164, whisper_loss=0.07559, over 14638.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01133, ecapa_loss=0.0001996, whisper_loss=0.0938, over 3859379.43 frames. ], batch size: 61, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:15:57,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+01 2.869e+01 3.117e+01 3.592e+01 6.207e+01, threshold=6.234e+01, percent-clipped=1.0 2024-08-11 11:16:43,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1069170.0, ans=0.2 2024-08-11 11:16:48,203 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 18 from LS+wenet, 33 from Vox, 41 fro AS 2024-08-11 11:16:52,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1069270.0, ans=10.0 2024-08-11 11:17:21,466 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 11:17:30,121 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 11:17:35,744 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5500, loss[loss=0.1157, beats_loss=0.01164, ecapa_loss=0.0001948, whisper_loss=0.1021, over 17549.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01142, ecapa_loss=0.0001988, whisper_loss=0.09377, over 3887360.89 frames. ], batch size: 68, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:17:49,915 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=22.5 2024-08-11 11:17:52,703 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2024-08-11 11:18:05,531 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 11:18:11,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1069570.0, ans=0.2 2024-08-11 11:18:13,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1069570.0, ans=0.0 2024-08-11 11:18:18,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1069670.0, ans=0.0 2024-08-11 11:18:20,380 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-11 11:19:10,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1069870.0, ans=0.125 2024-08-11 11:19:22,113 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5550, loss[loss=0.1185, beats_loss=0.01081, ecapa_loss=0.0002323, whisper_loss=0.1054, over 18689.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01144, ecapa_loss=0.0001994, whisper_loss=0.09369, over 3903573.05 frames. ], batch size: 77, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:19:23,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-08-11 11:19:28,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1069970.0, ans=0.0 2024-08-11 11:19:28,686 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.609e+01 2.954e+01 3.474e+01 6.484e+01, threshold=5.909e+01, percent-clipped=2.0 2024-08-11 11:20:17,186 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 11:20:44,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1070370.0, ans=0.125 2024-08-11 11:20:45,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1070370.0, ans=0.0 2024-08-11 11:20:53,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1070470.0, ans=0.125 2024-08-11 11:20:54,031 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5600, loss[loss=0.09058, beats_loss=0.01429, ecapa_loss=0.000195, whisper_loss=0.07435, over 22607.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01144, ecapa_loss=0.0001996, whisper_loss=0.094, over 3891261.90 frames. ], batch size: 96, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:21:04,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2024-08-11 11:21:08,246 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 11:21:20,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1070670.0, ans=0.125 2024-08-11 11:21:31,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1070670.0, ans=0.125 2024-08-11 11:21:31,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1070670.0, ans=0.125 2024-08-11 11:21:35,451 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-08-11 11:21:49,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1070770.0, ans=0.0 2024-08-11 11:21:52,781 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 11:21:57,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1070870.0, ans=0.2 2024-08-11 11:22:01,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1070870.0, ans=0.125 2024-08-11 11:22:07,170 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5650, loss[loss=0.1115, beats_loss=0.01245, ecapa_loss=0.0002042, whisper_loss=0.09697, over 20652.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01144, ecapa_loss=0.0002005, whisper_loss=0.09333, over 3882849.76 frames. ], batch size: 81, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:22:09,243 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-08-11 11:22:11,662 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.577e+01 2.929e+01 3.455e+01 8.964e+01, threshold=5.859e+01, percent-clipped=1.0 2024-08-11 11:22:36,155 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 11:22:39,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1071170.0, ans=0.0 2024-08-11 11:23:02,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1071270.0, ans=0.015 2024-08-11 11:23:07,189 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-11 11:23:20,053 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 11:23:25,682 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5700, loss[loss=0.08679, beats_loss=0.01038, ecapa_loss=0.0002149, whisper_loss=0.07426, over 14099.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01141, ecapa_loss=0.0002014, whisper_loss=0.09352, over 3912048.45 frames. ], batch size: 55, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:23:28,818 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 11:23:42,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1071570.0, ans=0.0 2024-08-11 11:24:04,437 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 11:24:15,815 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 11:24:21,509 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 11:24:26,333 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 16 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-11 11:24:43,802 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5750, loss[loss=0.1274, beats_loss=0.009017, ecapa_loss=0.0002447, whisper_loss=0.1159, over 22481.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01147, ecapa_loss=0.0002021, whisper_loss=0.09313, over 3882232.87 frames. ], batch size: 91, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:24:48,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.718e+01 3.107e+01 3.541e+01 5.804e+01, threshold=6.214e+01, percent-clipped=0.0 2024-08-11 11:24:53,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1071970.0, ans=0.125 2024-08-11 11:25:08,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1072070.0, ans=0.0 2024-08-11 11:25:19,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1072170.0, ans=0.1 2024-08-11 11:25:26,709 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.79 vs. limit=22.5 2024-08-11 11:25:48,287 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 11:25:51,471 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.734e+00 2024-08-11 11:26:02,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1072470.0, ans=0.125 2024-08-11 11:26:02,980 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5800, loss[loss=0.1284, beats_loss=0.009722, ecapa_loss=0.0001874, whisper_loss=0.1168, over 23297.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01143, ecapa_loss=0.0002007, whisper_loss=0.09324, over 3888612.58 frames. ], batch size: 89, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:26:09,613 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-11 11:26:17,812 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 11:26:21,118 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 11:26:37,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1072670.0, ans=0.1 2024-08-11 11:26:50,256 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.59 vs. limit=15.0 2024-08-11 11:26:53,835 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.25 vs. limit=15.0 2024-08-11 11:27:00,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=22.5 2024-08-11 11:27:02,406 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 11:27:12,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1072870.0, ans=0.0 2024-08-11 11:27:18,621 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5850, loss[loss=0.1019, beats_loss=0.01161, ecapa_loss=0.0002113, whisper_loss=0.08813, over 20806.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01142, ecapa_loss=0.0002006, whisper_loss=0.09327, over 3905554.72 frames. ], batch size: 84, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:27:23,673 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.774e+01 3.139e+01 3.627e+01 6.860e+01, threshold=6.277e+01, percent-clipped=1.0 2024-08-11 11:27:29,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1072970.0, ans=0.125 2024-08-11 11:27:36,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1073070.0, ans=0.125 2024-08-11 11:28:10,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1073270.0, ans=0.1 2024-08-11 11:28:12,757 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 11:28:15,202 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 11:28:24,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1073370.0, ans=0.025 2024-08-11 11:28:26,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1073370.0, ans=0.1 2024-08-11 11:28:31,164 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5900, loss[loss=0.1028, beats_loss=0.007812, ecapa_loss=0.0002771, whisper_loss=0.09222, over 17931.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01141, ecapa_loss=0.0002025, whisper_loss=0.09294, over 3897645.02 frames. ], batch size: 74, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:28:40,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1073470.0, ans=15.0 2024-08-11 11:28:41,490 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 11:28:46,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1073570.0, ans=0.125 2024-08-11 11:29:06,018 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 11:29:10,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1073670.0, ans=10.0 2024-08-11 11:29:19,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1073770.0, ans=0.0 2024-08-11 11:29:22,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1073770.0, ans=0.0 2024-08-11 11:29:34,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1073870.0, ans=0.125 2024-08-11 11:29:39,745 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 11:29:42,449 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 5950, loss[loss=0.1232, beats_loss=0.01162, ecapa_loss=0.0002165, whisper_loss=0.1094, over 21857.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01142, ecapa_loss=0.0002003, whisper_loss=0.09328, over 3934836.64 frames. ], batch size: 90, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:29:47,385 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.700e+01 3.029e+01 3.647e+01 6.302e+01, threshold=6.057e+01, percent-clipped=1.0 2024-08-11 11:29:51,785 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 11:30:08,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1074070.0, ans=0.1 2024-08-11 11:30:23,977 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 11:30:25,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1074270.0, ans=0.125 2024-08-11 11:30:27,742 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 11:30:28,267 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2024-08-11 11:30:39,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1074370.0, ans=0.125 2024-08-11 11:30:41,673 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.56 vs. limit=15.0 2024-08-11 11:30:42,577 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-11 11:30:46,709 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.40 vs. limit=22.5 2024-08-11 11:30:48,169 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2024-08-11 11:30:56,114 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6000, loss[loss=0.1142, beats_loss=0.01127, ecapa_loss=0.0001697, whisper_loss=0.1012, over 22919.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01141, ecapa_loss=0.0001988, whisper_loss=0.09333, over 3910541.30 frames. ], batch size: 89, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:30:56,114 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 11:31:34,861 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on ASR_libri: loss=0.2586, beats_loss=0, ecapa_loss=0.0006404, whisper_loss=0.2522, over 922467.00 frames. 2024-08-11 11:31:52,325 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on SV_voxceleb1: loss=0.005252, beats_loss=0, ecapa_loss=0.0005252, whisper_loss=0, over 939242.00 frames. 2024-08-11 11:32:57,176 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.5208e-03, 2.4897e-02, 1.4238e-02, 3.0793e+00, 1.0082e-05, 4.8726e-02, 4.8879e-02, 2.8629e-02], device='cuda:1') 2024-08-11 11:33:26,552 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.6924, 2.2165, 2.4711, 2.3383, 3.0877, 2.0431, 2.5336, 2.2456], device='cuda:1') 2024-08-11 11:33:45,223 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on AT_audioset: loss=0.02539, beats_loss=0.02539, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 11:33:45,226 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 11:34:01,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1074570.0, ans=0.1 2024-08-11 11:34:06,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1074570.0, ans=0.0 2024-08-11 11:34:38,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1074770.0, ans=0.2 2024-08-11 11:34:39,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1074770.0, ans=0.125 2024-08-11 11:34:41,631 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.96 vs. limit=22.5 2024-08-11 11:34:58,981 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6050, loss[loss=0.09507, beats_loss=0.01202, ecapa_loss=0.0002373, whisper_loss=0.08068, over 14246.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01141, ecapa_loss=0.0001968, whisper_loss=0.09408, over 3914715.35 frames. ], batch size: 59, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:35:03,614 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+01 2.749e+01 3.055e+01 3.427e+01 5.083e+01, threshold=6.111e+01, percent-clipped=0.0 2024-08-11 11:35:03,826 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-11 11:35:29,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1075170.0, ans=0.125 2024-08-11 11:35:32,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1075170.0, ans=0.0 2024-08-11 11:35:40,535 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.98 vs. limit=15.0 2024-08-11 11:35:44,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1075270.0, ans=0.04949747468305833 2024-08-11 11:35:54,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1075270.0, ans=0.125 2024-08-11 11:35:59,292 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 11:36:14,020 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6100, loss[loss=0.1206, beats_loss=0.01129, ecapa_loss=0.000172, whisper_loss=0.1076, over 19537.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01135, ecapa_loss=0.0001977, whisper_loss=0.09393, over 3899733.15 frames. ], batch size: 75, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:36:28,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1075570.0, ans=0.125 2024-08-11 11:36:44,835 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 11:36:58,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2024-08-11 11:37:00,426 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 11:37:03,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1075770.0, ans=0.2 2024-08-11 11:37:12,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1075770.0, ans=0.125 2024-08-11 11:37:30,023 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6150, loss[loss=0.1029, beats_loss=0.01247, ecapa_loss=0.0001711, whisper_loss=0.08873, over 21970.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01144, ecapa_loss=0.000197, whisper_loss=0.09353, over 3893985.83 frames. ], batch size: 88, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:37:33,910 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.55 vs. limit=10.0 2024-08-11 11:37:34,413 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.684e+01 3.005e+01 3.339e+01 4.754e+01, threshold=6.009e+01, percent-clipped=0.0 2024-08-11 11:37:46,314 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 11:37:53,653 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 11:37:55,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1076070.0, ans=0.125 2024-08-11 11:38:13,806 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 11:38:18,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1076270.0, ans=0.125 2024-08-11 11:38:19,290 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-08-11 11:38:23,539 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 11:38:29,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1076370.0, ans=0.0 2024-08-11 11:38:31,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1076370.0, ans=0.0 2024-08-11 11:38:38,109 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 11:38:39,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1076370.0, ans=0.125 2024-08-11 11:38:43,407 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6200, loss[loss=0.1206, beats_loss=0.01056, ecapa_loss=0.0001783, whisper_loss=0.1082, over 19591.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01142, ecapa_loss=0.0001973, whisper_loss=0.09354, over 3897245.36 frames. ], batch size: 76, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:39:09,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1076570.0, ans=0.125 2024-08-11 11:39:25,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1076670.0, ans=0.125 2024-08-11 11:39:59,979 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6250, loss[loss=0.115, beats_loss=0.0132, ecapa_loss=0.0001855, whisper_loss=0.09996, over 22690.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01141, ecapa_loss=0.0001978, whisper_loss=0.09342, over 3887202.93 frames. ], batch size: 90, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:40:04,193 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+01 2.830e+01 2.972e+01 3.439e+01 5.876e+01, threshold=5.945e+01, percent-clipped=0.0 2024-08-11 11:40:04,832 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 11:40:10,907 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=15.0 2024-08-11 11:40:24,114 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 11:40:35,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1077170.0, ans=0.07 2024-08-11 11:40:42,693 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 11:40:55,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1077270.0, ans=0.2 2024-08-11 11:41:02,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1077370.0, ans=0.0 2024-08-11 11:41:06,056 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.39 vs. limit=15.0 2024-08-11 11:41:10,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1077370.0, ans=0.1 2024-08-11 11:41:10,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1077370.0, ans=0.125 2024-08-11 11:41:13,136 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6300, loss[loss=0.09223, beats_loss=0.01449, ecapa_loss=0.0001909, whisper_loss=0.07583, over 22564.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01136, ecapa_loss=0.0001987, whisper_loss=0.0939, over 3893224.91 frames. ], batch size: 93, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:41:16,096 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 11:41:22,839 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 11:41:26,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1077570.0, ans=0.125 2024-08-11 11:41:31,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1077570.0, ans=0.0 2024-08-11 11:41:34,425 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.52 vs. limit=15.0 2024-08-11 11:41:43,722 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-08-11 11:42:13,880 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-11 11:42:24,886 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6350, loss[loss=0.113, beats_loss=0.0085, ecapa_loss=0.0002659, whisper_loss=0.1018, over 16207.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01134, ecapa_loss=0.0001993, whisper_loss=0.09396, over 3863113.25 frames. ], batch size: 66, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:42:25,207 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 11:42:29,315 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.640e+01 2.866e+01 3.160e+01 1.102e+02, threshold=5.732e+01, percent-clipped=1.0 2024-08-11 11:42:34,524 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 29 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 11:42:41,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1078070.0, ans=0.0 2024-08-11 11:42:46,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1078070.0, ans=0.0 2024-08-11 11:42:49,817 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 11:42:59,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1078170.0, ans=0.0 2024-08-11 11:43:02,573 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 11:43:06,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1078170.0, ans=0.125 2024-08-11 11:43:39,541 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6400, loss[loss=0.1156, beats_loss=0.01021, ecapa_loss=0.0002065, whisper_loss=0.1033, over 18000.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01139, ecapa_loss=0.0001989, whisper_loss=0.09364, over 3862099.20 frames. ], batch size: 72, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:43:41,403 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 11:43:47,461 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-11 11:43:48,939 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 11:43:55,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1078570.0, ans=0.035 2024-08-11 11:44:20,170 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.50 vs. limit=10.0 2024-08-11 11:44:20,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1078670.0, ans=0.0 2024-08-11 11:44:23,091 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 11:44:23,708 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2024-08-11 11:44:32,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1078770.0, ans=0.1 2024-08-11 11:44:56,003 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6450, loss[loss=0.09377, beats_loss=0.01331, ecapa_loss=0.0001692, whisper_loss=0.07877, over 23716.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01147, ecapa_loss=0.0001985, whisper_loss=0.09339, over 3848980.62 frames. ], batch size: 94, lr: 7.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:45:01,156 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.754e+01 3.078e+01 3.674e+01 5.893e+01, threshold=6.156e+01, percent-clipped=1.0 2024-08-11 11:45:06,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1078970.0, ans=0.125 2024-08-11 11:45:11,192 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 11:45:15,206 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=22.5 2024-08-11 11:45:29,566 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 34 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-11 11:45:53,480 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.39 vs. limit=15.0 2024-08-11 11:45:55,466 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-11 11:45:56,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1079370.0, ans=0.2 2024-08-11 11:46:08,892 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6500, loss[loss=0.1022, beats_loss=0.01334, ecapa_loss=0.0001707, whisper_loss=0.08716, over 21047.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01141, ecapa_loss=0.0001973, whisper_loss=0.09368, over 3880165.92 frames. ], batch size: 85, lr: 7.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:46:21,032 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=8.083e-02 2024-08-11 11:46:27,506 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.41 vs. limit=22.5 2024-08-11 11:46:41,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1079670.0, ans=0.0 2024-08-11 11:46:44,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1079670.0, ans=0.05 2024-08-11 11:47:02,335 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 11:47:20,290 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6550, loss[loss=0.1078, beats_loss=0.01054, ecapa_loss=0.0001764, whisper_loss=0.09554, over 19321.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01142, ecapa_loss=0.0001978, whisper_loss=0.09393, over 3890583.55 frames. ], batch size: 75, lr: 7.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:47:27,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+01 2.781e+01 3.122e+01 3.450e+01 5.322e+01, threshold=6.243e+01, percent-clipped=0.0 2024-08-11 11:47:29,283 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-11 11:47:48,422 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2024-08-11 11:47:51,454 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-11 11:48:00,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1080170.0, ans=0.5 2024-08-11 11:48:00,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1080170.0, ans=0.125 2024-08-11 11:48:18,754 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-11 11:48:23,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1080370.0, ans=0.125 2024-08-11 11:48:37,010 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6600, loss[loss=0.1097, beats_loss=0.01155, ecapa_loss=0.0001721, whisper_loss=0.09642, over 22033.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01139, ecapa_loss=0.0001988, whisper_loss=0.09363, over 3887873.31 frames. ], batch size: 86, lr: 7.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:48:37,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1080470.0, ans=0.1 2024-08-11 11:48:58,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1080570.0, ans=0.125 2024-08-11 11:49:08,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1080670.0, ans=0.05 2024-08-11 11:49:11,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1080670.0, ans=0.09899494936611666 2024-08-11 11:49:16,864 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 11:49:29,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1080770.0, ans=0.0 2024-08-11 11:49:50,402 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6650, loss[loss=0.1013, beats_loss=0.0122, ecapa_loss=0.0002133, whisper_loss=0.08697, over 15231.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.0113, ecapa_loss=0.0002007, whisper_loss=0.09393, over 3892496.20 frames. ], batch size: 61, lr: 7.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:49:53,521 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 11:49:54,554 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.681e+01 2.981e+01 3.448e+01 5.241e+01, threshold=5.962e+01, percent-clipped=0.0 2024-08-11 11:49:55,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1080970.0, ans=0.0 2024-08-11 11:49:57,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1080970.0, ans=0.1 2024-08-11 11:49:59,804 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.58 vs. limit=15.0 2024-08-11 11:50:00,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1080970.0, ans=0.125 2024-08-11 11:50:03,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1081070.0, ans=0.125 2024-08-11 11:50:08,809 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 11:50:13,423 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.23 vs. limit=22.5 2024-08-11 11:50:18,430 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 11:50:23,393 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.51 vs. limit=15.0 2024-08-11 11:50:29,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1081170.0, ans=0.125 2024-08-11 11:50:37,019 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 14 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 11:50:39,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1081270.0, ans=0.0 2024-08-11 11:50:40,940 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 24 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-11 11:50:45,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1081270.0, ans=0.0 2024-08-11 11:50:48,223 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 11:50:58,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1081370.0, ans=0.125 2024-08-11 11:51:00,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=1081470.0, ans=0.2 2024-08-11 11:51:01,641 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6700, loss[loss=0.1167, beats_loss=0.008333, ecapa_loss=0.0002355, whisper_loss=0.106, over 18289.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01126, ecapa_loss=0.0001998, whisper_loss=0.09428, over 3880384.06 frames. ], batch size: 74, lr: 7.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:51:15,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1081570.0, ans=0.125 2024-08-11 11:51:27,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1081570.0, ans=0.07 2024-08-11 11:51:29,885 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-08-11 11:51:58,887 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-11 11:51:59,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1081870.0, ans=0.125 2024-08-11 11:52:14,841 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6750, loss[loss=0.1177, beats_loss=0.008394, ecapa_loss=0.0002358, whisper_loss=0.1069, over 17740.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01123, ecapa_loss=0.0002001, whisper_loss=0.09386, over 3879352.66 frames. ], batch size: 70, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:52:18,912 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.942e+01 3.557e+01 4.197e+01 2.407e+02, threshold=7.114e+01, percent-clipped=7.0 2024-08-11 11:52:26,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1081970.0, ans=0.0 2024-08-11 11:52:28,065 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2024-08-11 11:52:29,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-11 11:52:40,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1082070.0, ans=0.1 2024-08-11 11:52:45,502 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 11:52:50,189 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 11:53:07,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1082270.0, ans=15.0 2024-08-11 11:53:25,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2024-08-11 11:53:27,019 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6800, loss[loss=0.1042, beats_loss=0.01077, ecapa_loss=0.000243, whisper_loss=0.09102, over 16804.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01129, ecapa_loss=0.0002007, whisper_loss=0.09359, over 3864794.46 frames. ], batch size: 69, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:53:30,868 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 11:53:36,244 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 11:53:36,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1082470.0, ans=0.1 2024-08-11 11:53:47,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1082570.0, ans=0.2 2024-08-11 11:53:50,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1082570.0, ans=0.1 2024-08-11 11:53:56,948 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 11:54:06,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1082670.0, ans=0.0 2024-08-11 11:54:13,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1082770.0, ans=0.125 2024-08-11 11:54:17,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1082770.0, ans=0.125 2024-08-11 11:54:20,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1082770.0, ans=0.125 2024-08-11 11:54:28,985 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 11:54:37,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.02 vs. limit=5.0 2024-08-11 11:54:39,985 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6850, loss[loss=0.1021, beats_loss=0.0129, ecapa_loss=0.0002138, whisper_loss=0.08707, over 21488.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01137, ecapa_loss=0.0002008, whisper_loss=0.0931, over 3867487.47 frames. ], batch size: 90, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:54:44,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.694e+01 2.999e+01 3.363e+01 5.238e+01, threshold=5.998e+01, percent-clipped=0.0 2024-08-11 11:54:50,001 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-11 11:55:03,443 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 11:55:13,497 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2024-08-11 11:55:19,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1083170.0, ans=0.0 2024-08-11 11:55:19,962 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.63 vs. limit=22.5 2024-08-11 11:55:21,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1083170.0, ans=0.0 2024-08-11 11:55:21,398 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2024-08-11 11:55:36,047 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2024-08-11 11:55:48,472 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 11:55:49,792 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6900, loss[loss=0.1173, beats_loss=0.01034, ecapa_loss=0.0002343, whisper_loss=0.1046, over 14552.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01132, ecapa_loss=0.0002012, whisper_loss=0.09335, over 3859758.98 frames. ], batch size: 57, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:55:56,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1083470.0, ans=0.2 2024-08-11 11:56:15,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1083570.0, ans=0.125 2024-08-11 11:56:17,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1083670.0, ans=0.125 2024-08-11 11:56:18,438 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 10 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 11:56:27,405 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.81 vs. limit=22.5 2024-08-11 11:56:37,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1083770.0, ans=0.0 2024-08-11 11:56:53,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1083870.0, ans=0.035 2024-08-11 11:56:56,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1083970.0, ans=0.0 2024-08-11 11:56:57,365 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 6950, loss[loss=0.1073, beats_loss=0.009344, ecapa_loss=0.0002131, whisper_loss=0.09578, over 22288.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01141, ecapa_loss=0.0002002, whisper_loss=0.09261, over 3841525.02 frames. ], batch size: 88, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:57:00,430 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-11 11:57:01,686 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+01 2.671e+01 2.938e+01 3.749e+01 5.482e+01, threshold=5.876e+01, percent-clipped=0.0 2024-08-11 11:57:30,078 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 21 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-11 11:57:32,502 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 24 from Vox, 17 fro AS 2024-08-11 11:57:48,549 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-11 11:57:48,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1084270.0, ans=0.0 2024-08-11 11:57:56,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1084370.0, ans=0.125 2024-08-11 11:58:01,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1084370.0, ans=10.0 2024-08-11 11:58:03,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1084470.0, ans=0.125 2024-08-11 11:58:04,495 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7000, loss[loss=0.1265, beats_loss=0.00876, ecapa_loss=0.0002291, whisper_loss=0.1154, over 22297.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01136, ecapa_loss=0.0002006, whisper_loss=0.09197, over 3820008.79 frames. ], batch size: 89, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:58:05,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1084470.0, ans=0.125 2024-08-11 11:58:10,639 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.579e-03 2024-08-11 11:58:18,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1084570.0, ans=0.0 2024-08-11 11:58:35,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=22.5 2024-08-11 11:58:38,469 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 38 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 11:58:59,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1084870.0, ans=0.125 2024-08-11 11:59:11,982 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7050, loss[loss=0.1099, beats_loss=0.01427, ecapa_loss=0.000151, whisper_loss=0.09409, over 22064.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01135, ecapa_loss=0.0002013, whisper_loss=0.09249, over 3837625.00 frames. ], batch size: 86, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:59:13,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1084970.0, ans=0.2 2024-08-11 11:59:15,910 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.647e+01 2.921e+01 3.539e+01 5.654e+01, threshold=5.842e+01, percent-clipped=0.0 2024-08-11 11:59:17,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1084970.0, ans=0.125 2024-08-11 11:59:33,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1085070.0, ans=0.125 2024-08-11 11:59:59,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1085270.0, ans=0.125 2024-08-11 12:00:00,906 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.196e-01 2024-08-11 12:00:01,111 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-11 12:00:12,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1085370.0, ans=0.95 2024-08-11 12:00:19,303 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7100, loss[loss=0.1124, beats_loss=0.0117, ecapa_loss=0.0002395, whisper_loss=0.09832, over 21605.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01132, ecapa_loss=0.000201, whisper_loss=0.09265, over 3826494.39 frames. ], batch size: 93, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:00:24,942 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 12:00:38,493 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-11 12:00:55,593 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 12:00:59,559 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.99 vs. limit=10.0 2024-08-11 12:01:02,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1085770.0, ans=0.1 2024-08-11 12:01:11,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1085870.0, ans=0.5 2024-08-11 12:01:12,822 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2024-08-11 12:01:16,119 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 12:01:16,640 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.33 vs. limit=15.0 2024-08-11 12:01:25,626 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7150, loss[loss=0.09768, beats_loss=0.01153, ecapa_loss=0.0001965, whisper_loss=0.08419, over 21486.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01127, ecapa_loss=0.0002009, whisper_loss=0.09297, over 3838886.35 frames. ], batch size: 89, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:01:28,544 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 12:01:29,713 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.825e+01 3.133e+01 3.530e+01 6.975e+01, threshold=6.267e+01, percent-clipped=1.0 2024-08-11 12:01:33,742 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 12:01:39,180 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 12:01:44,151 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 34 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 12:01:49,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1086070.0, ans=0.125 2024-08-11 12:02:00,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1086170.0, ans=0.125 2024-08-11 12:02:16,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1086270.0, ans=0.125 2024-08-11 12:02:21,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1086370.0, ans=0.125 2024-08-11 12:02:23,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1086370.0, ans=0.125 2024-08-11 12:02:26,084 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 12:02:29,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1086370.0, ans=0.07 2024-08-11 12:02:32,428 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7200, loss[loss=0.08436, beats_loss=0.01323, ecapa_loss=0.0002408, whisper_loss=0.06873, over 17228.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01124, ecapa_loss=0.0002001, whisper_loss=0.09346, over 3846170.35 frames. ], batch size: 75, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:02:35,153 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 12:02:43,651 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 12:02:52,429 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.82 vs. limit=15.0 2024-08-11 12:03:02,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1086670.0, ans=0.07 2024-08-11 12:03:05,342 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 12:03:06,342 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 12:03:12,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1086770.0, ans=0.0 2024-08-11 12:03:21,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1086770.0, ans=0.125 2024-08-11 12:03:32,800 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 15 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 12:03:35,170 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 12:03:40,422 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7250, loss[loss=0.09773, beats_loss=0.01091, ecapa_loss=0.0002164, whisper_loss=0.08466, over 22226.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01127, ecapa_loss=0.0002009, whisper_loss=0.09304, over 3846080.58 frames. ], batch size: 92, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:03:44,514 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.767e+01 3.129e+01 3.597e+01 6.037e+01, threshold=6.257e+01, percent-clipped=0.0 2024-08-11 12:04:14,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1087170.0, ans=0.125 2024-08-11 12:04:16,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1087170.0, ans=0.1 2024-08-11 12:04:47,608 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7300, loss[loss=0.1274, beats_loss=0.01017, ecapa_loss=0.0001962, whisper_loss=0.1153, over 23425.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01125, ecapa_loss=0.0002011, whisper_loss=0.09299, over 3854916.40 frames. ], batch size: 89, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:05:03,674 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 12:05:27,397 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 12:05:39,351 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.46 vs. limit=22.5 2024-08-11 12:05:42,843 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 12:05:50,473 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-11 12:05:55,848 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7350, loss[loss=0.1105, beats_loss=0.0125, ecapa_loss=0.000191, whisper_loss=0.09605, over 19138.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01122, ecapa_loss=0.0002016, whisper_loss=0.09342, over 3843626.90 frames. ], batch size: 75, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:05:56,077 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 12:05:58,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1087970.0, ans=0.125 2024-08-11 12:05:59,587 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.629e+01 2.975e+01 3.413e+01 5.829e+01, threshold=5.951e+01, percent-clipped=0.0 2024-08-11 12:06:16,436 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 12:06:21,197 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2024-08-11 12:06:23,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1088170.0, ans=0.125 2024-08-11 12:06:53,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1088370.0, ans=0.125 2024-08-11 12:07:03,671 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7400, loss[loss=0.1239, beats_loss=0.008633, ecapa_loss=0.0002511, whisper_loss=0.1128, over 16171.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01126, ecapa_loss=0.0002018, whisper_loss=0.09314, over 3833159.86 frames. ], batch size: 67, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:07:03,785 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 9 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 12:07:12,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1088470.0, ans=0.025 2024-08-11 12:07:42,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1088770.0, ans=0.1 2024-08-11 12:07:46,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1088770.0, ans=0.0 2024-08-11 12:07:46,786 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2024-08-11 12:07:54,269 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 12:08:10,127 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7450, loss[loss=0.1253, beats_loss=0.008505, ecapa_loss=0.0002485, whisper_loss=0.1143, over 13914.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01132, ecapa_loss=0.0001997, whisper_loss=0.0928, over 3853465.81 frames. ], batch size: 55, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:08:14,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.716e+01 3.101e+01 3.669e+01 6.917e+01, threshold=6.202e+01, percent-clipped=1.0 2024-08-11 12:08:16,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.90 vs. limit=10.0 2024-08-11 12:08:26,489 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 15 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 12:08:37,390 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 12:08:43,968 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 12:08:59,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1089270.0, ans=0.2 2024-08-11 12:09:12,205 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 12:09:21,127 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7500, loss[loss=0.08086, beats_loss=0.01096, ecapa_loss=0.0002154, whisper_loss=0.06774, over 15715.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01127, ecapa_loss=0.0002004, whisper_loss=0.09323, over 3855589.14 frames. ], batch size: 65, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:09:31,226 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 12:09:35,233 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 12:09:44,488 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2024-08-11 12:09:46,780 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 12:09:48,780 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2024-08-11 12:09:58,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1089670.0, ans=0.125 2024-08-11 12:10:13,706 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 12:10:17,619 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 12:10:29,534 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.36 vs. limit=15.0 2024-08-11 12:10:31,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1089970.0, ans=0.1 2024-08-11 12:10:32,552 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7550, loss[loss=0.1015, beats_loss=0.009011, ecapa_loss=0.0001948, whisper_loss=0.09054, over 15317.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01129, ecapa_loss=0.0002004, whisper_loss=0.09244, over 3842173.36 frames. ], batch size: 58, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:10:36,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+01 2.654e+01 2.939e+01 3.334e+01 5.450e+01, threshold=5.879e+01, percent-clipped=0.0 2024-08-11 12:10:54,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1090070.0, ans=0.125 2024-08-11 12:10:57,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2024-08-11 12:11:03,085 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2024-08-11 12:11:18,887 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-08-11 12:11:24,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=1090270.0, ans=0.2 2024-08-11 12:11:25,804 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-11 12:11:26,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1090270.0, ans=0.0 2024-08-11 12:11:27,971 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.74 vs. limit=12.0 2024-08-11 12:11:34,341 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-11 12:11:37,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1090370.0, ans=0.05 2024-08-11 12:11:44,026 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7600, loss[loss=0.1043, beats_loss=0.01174, ecapa_loss=0.0002308, whisper_loss=0.09021, over 21364.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01122, ecapa_loss=0.0002014, whisper_loss=0.09274, over 3846405.66 frames. ], batch size: 86, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:12:14,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1090670.0, ans=0.125 2024-08-11 12:12:32,708 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.093e-01 2024-08-11 12:12:34,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1090770.0, ans=0.125 2024-08-11 12:12:50,210 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2024-08-11 12:12:52,356 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7650, loss[loss=0.1142, beats_loss=0.01121, ecapa_loss=0.0001692, whisper_loss=0.1013, over 22423.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0112, ecapa_loss=0.0002014, whisper_loss=0.09277, over 3860190.81 frames. ], batch size: 84, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:12:56,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.822e+01 3.132e+01 3.571e+01 5.523e+01, threshold=6.263e+01, percent-clipped=0.0 2024-08-11 12:13:04,937 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 12:13:09,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1091070.0, ans=22.5 2024-08-11 12:13:18,365 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 12 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 12:13:45,232 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 12:13:51,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.25 vs. limit=15.0 2024-08-11 12:13:56,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1091370.0, ans=0.0 2024-08-11 12:13:59,603 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7700, loss[loss=0.1146, beats_loss=0.009832, ecapa_loss=0.0002269, whisper_loss=0.1025, over 15603.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01121, ecapa_loss=0.0002013, whisper_loss=0.09299, over 3874521.56 frames. ], batch size: 62, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:14:00,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1091470.0, ans=0.0 2024-08-11 12:14:01,876 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=12.0 2024-08-11 12:14:05,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1091470.0, ans=0.125 2024-08-11 12:14:10,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=1091470.0, ans=0.1 2024-08-11 12:14:31,132 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=12.0 2024-08-11 12:14:55,108 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.38 vs. limit=22.5 2024-08-11 12:14:55,711 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 12:14:58,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.18 vs. limit=10.0 2024-08-11 12:15:03,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1091870.0, ans=0.125 2024-08-11 12:15:05,933 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7750, loss[loss=0.09502, beats_loss=0.01263, ecapa_loss=0.0002025, whisper_loss=0.08036, over 22007.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01126, ecapa_loss=0.0002001, whisper_loss=0.09292, over 3891464.97 frames. ], batch size: 90, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:15:06,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1091970.0, ans=0.0 2024-08-11 12:15:10,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.756e+01 3.140e+01 3.838e+01 1.235e+02, threshold=6.279e+01, percent-clipped=2.0 2024-08-11 12:15:15,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1091970.0, ans=0.125 2024-08-11 12:15:15,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1091970.0, ans=0.07 2024-08-11 12:15:17,955 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 12:15:20,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1092070.0, ans=0.125 2024-08-11 12:15:42,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1092170.0, ans=0.125 2024-08-11 12:15:48,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1092270.0, ans=0.0 2024-08-11 12:15:56,167 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2024-08-11 12:16:10,831 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7800, loss[loss=0.1042, beats_loss=0.00956, ecapa_loss=0.0002356, whisper_loss=0.09226, over 14286.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01136, ecapa_loss=0.0001987, whisper_loss=0.09263, over 3907921.38 frames. ], batch size: 57, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:16:20,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1092470.0, ans=0.125 2024-08-11 12:17:05,990 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=15.0 2024-08-11 12:17:11,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1092870.0, ans=0.0 2024-08-11 12:17:17,588 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7850, loss[loss=0.1228, beats_loss=0.01038, ecapa_loss=0.000205, whisper_loss=0.1104, over 17213.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01136, ecapa_loss=0.000199, whisper_loss=0.09321, over 3899883.33 frames. ], batch size: 68, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:17:20,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1092970.0, ans=0.2 2024-08-11 12:17:21,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.734e+01 3.036e+01 3.446e+01 5.621e+01, threshold=6.073e+01, percent-clipped=0.0 2024-08-11 12:17:36,650 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 12:17:41,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1093070.0, ans=22.5 2024-08-11 12:18:11,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1093370.0, ans=0.125 2024-08-11 12:18:24,589 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7900, loss[loss=0.121, beats_loss=0.009852, ecapa_loss=0.0002071, whisper_loss=0.1091, over 21963.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01138, ecapa_loss=0.0001981, whisper_loss=0.09381, over 3919475.42 frames. ], batch size: 88, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:18:29,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1093470.0, ans=0.125 2024-08-11 12:18:47,028 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 12:19:09,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1093770.0, ans=0.0 2024-08-11 12:19:24,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1093870.0, ans=0.0 2024-08-11 12:19:25,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1093870.0, ans=0.125 2024-08-11 12:19:29,586 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-11 12:19:30,696 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 7950, loss[loss=0.08929, beats_loss=0.0083, ecapa_loss=0.0002933, whisper_loss=0.07806, over 14366.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01138, ecapa_loss=0.0001998, whisper_loss=0.09332, over 3911120.91 frames. ], batch size: 59, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:19:32,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1093970.0, ans=0.125 2024-08-11 12:19:34,966 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.750e+01 3.082e+01 3.483e+01 5.642e+01, threshold=6.163e+01, percent-clipped=0.0 2024-08-11 12:19:52,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1094070.0, ans=0.1 2024-08-11 12:20:13,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1094270.0, ans=0.125 2024-08-11 12:20:37,621 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8000, loss[loss=0.09368, beats_loss=0.01232, ecapa_loss=0.00025, whisper_loss=0.07886, over 22313.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01142, ecapa_loss=0.0001983, whisper_loss=0.09272, over 3883160.90 frames. ], batch size: 96, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:20:41,044 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2024-08-11 12:20:46,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1094470.0, ans=0.04949747468305833 2024-08-11 12:21:05,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1094670.0, ans=0.125 2024-08-11 12:21:07,511 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 12:21:13,909 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.19 vs. limit=15.0 2024-08-11 12:21:19,695 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-11 12:21:28,817 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 12 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 12:21:32,941 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-11 12:21:33,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1094870.0, ans=0.125 2024-08-11 12:21:41,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1094870.0, ans=0.0 2024-08-11 12:21:44,761 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8050, loss[loss=0.1163, beats_loss=0.01121, ecapa_loss=0.000169, whisper_loss=0.1034, over 23221.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01138, ecapa_loss=0.0001977, whisper_loss=0.09262, over 3876682.07 frames. ], batch size: 90, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:21:48,659 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.683e+01 3.112e+01 3.562e+01 5.362e+01, threshold=6.224e+01, percent-clipped=0.0 2024-08-11 12:21:54,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1094970.0, ans=0.125 2024-08-11 12:22:04,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1095070.0, ans=0.05 2024-08-11 12:22:08,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1095070.0, ans=0.125 2024-08-11 12:22:10,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1095070.0, ans=0.0 2024-08-11 12:22:22,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1095170.0, ans=0.125 2024-08-11 12:22:24,562 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-11 12:22:31,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1095270.0, ans=0.2 2024-08-11 12:22:33,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=15.0 2024-08-11 12:22:40,790 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.766e-01 2024-08-11 12:22:43,322 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-11 12:22:43,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1095370.0, ans=0.125 2024-08-11 12:22:52,177 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8100, loss[loss=0.09845, beats_loss=0.01026, ecapa_loss=0.0002076, whisper_loss=0.08612, over 21247.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01134, ecapa_loss=0.0001987, whisper_loss=0.09271, over 3862957.31 frames. ], batch size: 86, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:22:53,856 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 12:22:54,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1095470.0, ans=0.2 2024-08-11 12:22:59,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1095470.0, ans=0.2 2024-08-11 12:23:01,702 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-11 12:23:08,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1095570.0, ans=0.0 2024-08-11 12:23:13,860 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 12:23:21,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1095670.0, ans=0.125 2024-08-11 12:23:28,946 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.27 vs. limit=10.0 2024-08-11 12:23:29,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1095670.0, ans=0.0 2024-08-11 12:23:32,159 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 12:23:48,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1095870.0, ans=0.0 2024-08-11 12:23:58,962 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8150, loss[loss=0.09411, beats_loss=0.01463, ecapa_loss=0.000164, whisper_loss=0.07784, over 22410.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01132, ecapa_loss=0.0001983, whisper_loss=0.09235, over 3857183.70 frames. ], batch size: 92, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:24:03,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.662e+01 2.951e+01 3.382e+01 5.794e+01, threshold=5.903e+01, percent-clipped=0.0 2024-08-11 12:24:07,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1095970.0, ans=0.0 2024-08-11 12:24:25,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1096170.0, ans=0.125 2024-08-11 12:24:42,590 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 12:24:45,183 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 12:25:04,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1096370.0, ans=0.1 2024-08-11 12:25:06,402 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8200, loss[loss=0.1199, beats_loss=0.009473, ecapa_loss=0.0001887, whisper_loss=0.1085, over 22455.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01134, ecapa_loss=0.0001993, whisper_loss=0.09211, over 3871126.59 frames. ], batch size: 88, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:25:11,047 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2024-08-11 12:25:14,500 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 12:25:25,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1096570.0, ans=0.2 2024-08-11 12:25:25,757 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2024-08-11 12:25:27,654 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 12:25:35,895 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 12:25:39,928 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 12:26:02,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1096870.0, ans=0.2 2024-08-11 12:26:12,520 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8250, loss[loss=0.1114, beats_loss=0.009643, ecapa_loss=0.0002559, whisper_loss=0.09916, over 15960.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01137, ecapa_loss=0.0001988, whisper_loss=0.09287, over 3889847.80 frames. ], batch size: 64, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:26:16,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.782e+01 3.103e+01 3.474e+01 6.879e+01, threshold=6.206e+01, percent-clipped=1.0 2024-08-11 12:27:07,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1097370.0, ans=0.0 2024-08-11 12:27:19,914 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8300, loss[loss=0.0963, beats_loss=0.01123, ecapa_loss=0.0001833, whisper_loss=0.08323, over 18878.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01148, ecapa_loss=0.0001974, whisper_loss=0.09216, over 3880545.72 frames. ], batch size: 75, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:27:20,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1097470.0, ans=0.125 2024-08-11 12:27:21,348 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 12:27:28,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1097470.0, ans=0.0 2024-08-11 12:27:35,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1097570.0, ans=0.125 2024-08-11 12:27:38,129 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 12:27:40,151 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-08-11 12:27:42,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1097570.0, ans=0.5 2024-08-11 12:27:49,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1097670.0, ans=10.0 2024-08-11 12:27:55,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1097670.0, ans=0.0 2024-08-11 12:27:58,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1097770.0, ans=0.2 2024-08-11 12:28:22,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1097870.0, ans=0.0 2024-08-11 12:28:25,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1097970.0, ans=0.125 2024-08-11 12:28:26,307 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8350, loss[loss=0.1256, beats_loss=0.01138, ecapa_loss=0.0002076, whisper_loss=0.1122, over 19931.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01142, ecapa_loss=0.0001979, whisper_loss=0.09297, over 3892468.10 frames. ], batch size: 80, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:28:30,524 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.714e+01 3.261e+01 3.683e+01 6.544e+01, threshold=6.523e+01, percent-clipped=1.0 2024-08-11 12:28:34,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1097970.0, ans=0.125 2024-08-11 12:28:58,177 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.80 vs. limit=6.0 2024-08-11 12:29:21,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1098370.0, ans=0.125 2024-08-11 12:29:33,641 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.64 vs. limit=10.0 2024-08-11 12:29:34,027 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8400, loss[loss=0.136, beats_loss=0.007302, ecapa_loss=0.0001915, whisper_loss=0.1268, over 16750.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01133, ecapa_loss=0.0001977, whisper_loss=0.09433, over 3888090.31 frames. ], batch size: 62, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:29:39,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1098470.0, ans=0.0 2024-08-11 12:29:43,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1098470.0, ans=0.09899494936611666 2024-08-11 12:29:56,646 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-11 12:30:06,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1098670.0, ans=0.1 2024-08-11 12:30:24,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1098770.0, ans=0.1 2024-08-11 12:30:26,748 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 11 from Vox, 43 fro AS 2024-08-11 12:30:26,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1098870.0, ans=0.125 2024-08-11 12:30:27,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1098870.0, ans=0.2 2024-08-11 12:30:29,296 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 12:30:39,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1098970.0, ans=0.0 2024-08-11 12:30:40,610 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8450, loss[loss=0.09489, beats_loss=0.01151, ecapa_loss=0.0002044, whisper_loss=0.08134, over 21952.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0112, ecapa_loss=0.000199, whisper_loss=0.09457, over 3900130.80 frames. ], batch size: 89, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:30:44,828 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.709e+01 3.054e+01 3.505e+01 4.740e+01, threshold=6.108e+01, percent-clipped=0.0 2024-08-11 12:30:46,832 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.76 vs. limit=10.0 2024-08-11 12:30:53,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1099070.0, ans=0.05 2024-08-11 12:30:53,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1099070.0, ans=0.125 2024-08-11 12:30:58,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1099070.0, ans=0.0 2024-08-11 12:31:01,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1099070.0, ans=0.0 2024-08-11 12:31:23,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1099270.0, ans=0.0 2024-08-11 12:31:25,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1099270.0, ans=0.0 2024-08-11 12:31:30,372 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2024-08-11 12:31:32,127 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2024-08-11 12:31:39,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1099370.0, ans=0.0 2024-08-11 12:31:41,661 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 12:31:45,252 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=22.5 2024-08-11 12:31:46,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1099470.0, ans=0.2 2024-08-11 12:31:46,965 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8500, loss[loss=0.09705, beats_loss=0.01118, ecapa_loss=0.000187, whisper_loss=0.084, over 18398.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01117, ecapa_loss=0.0001985, whisper_loss=0.09444, over 3890014.94 frames. ], batch size: 75, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:31:59,955 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.65 vs. limit=12.0 2024-08-11 12:32:17,944 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.300e-02 2024-08-11 12:32:38,082 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 22 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-11 12:32:39,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1099770.0, ans=0.1 2024-08-11 12:32:56,045 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8550, loss[loss=0.1173, beats_loss=0.008781, ecapa_loss=0.0002088, whisper_loss=0.1065, over 19319.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01115, ecapa_loss=0.0001997, whisper_loss=0.09466, over 3908903.23 frames. ], batch size: 71, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:32:59,028 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-11 12:33:00,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.728e+01 3.009e+01 3.613e+01 5.860e+01, threshold=6.017e+01, percent-clipped=0.0 2024-08-11 12:33:04,966 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 24 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 12:33:08,193 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-11 12:33:09,465 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 12:33:14,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1100070.0, ans=0.2 2024-08-11 12:33:15,474 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.292e-02 2024-08-11 12:33:37,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1100170.0, ans=0.0 2024-08-11 12:33:53,876 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 12:33:57,110 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 12:34:11,114 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8600, loss[loss=0.07977, beats_loss=0.01541, ecapa_loss=0.0001784, whisper_loss=0.06258, over 15693.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01119, ecapa_loss=0.0001992, whisper_loss=0.09457, over 3908158.67 frames. ], batch size: 63, lr: 7.85e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:34:14,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1100470.0, ans=0.2 2024-08-11 12:34:19,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1100470.0, ans=0.125 2024-08-11 12:34:22,392 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 12:34:31,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1100570.0, ans=0.0 2024-08-11 12:34:58,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1100770.0, ans=0.0 2024-08-11 12:35:02,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1100770.0, ans=0.125 2024-08-11 12:35:03,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1100770.0, ans=0.125 2024-08-11 12:35:07,932 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 12:35:16,565 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 12:35:17,781 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 12:35:19,371 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-11 12:35:20,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1100870.0, ans=0.125 2024-08-11 12:35:26,716 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8650, loss[loss=0.11, beats_loss=0.01128, ecapa_loss=0.0001631, whisper_loss=0.09711, over 21230.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01131, ecapa_loss=0.0001985, whisper_loss=0.09388, over 3901408.19 frames. ], batch size: 85, lr: 7.85e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:35:28,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1100970.0, ans=0.0 2024-08-11 12:35:31,151 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.631e+01 2.958e+01 3.559e+01 6.258e+01, threshold=5.915e+01, percent-clipped=1.0 2024-08-11 12:35:39,025 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-11 12:36:00,802 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 12:36:06,527 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 12:36:27,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1101270.0, ans=0.2 2024-08-11 12:36:47,134 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8700, loss[loss=0.1376, beats_loss=0.008569, ecapa_loss=0.0002105, whisper_loss=0.1269, over 15916.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01122, ecapa_loss=0.0002005, whisper_loss=0.09347, over 3873282.03 frames. ], batch size: 60, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:36:58,121 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.00 vs. limit=22.5 2024-08-11 12:36:59,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1101470.0, ans=0.125 2024-08-11 12:37:01,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1101470.0, ans=0.1 2024-08-11 12:37:07,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1101570.0, ans=0.0 2024-08-11 12:37:21,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-08-11 12:37:34,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1101670.0, ans=0.1 2024-08-11 12:37:47,153 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 19 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-11 12:37:55,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-11 12:38:11,084 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8750, loss[loss=0.1087, beats_loss=0.01343, ecapa_loss=0.0002027, whisper_loss=0.09326, over 21831.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01131, ecapa_loss=0.0001998, whisper_loss=0.09323, over 3854363.92 frames. ], batch size: 89, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:38:15,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.807e+01 3.199e+01 3.848e+01 5.840e+01, threshold=6.398e+01, percent-clipped=0.0 2024-08-11 12:38:22,670 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2024-08-11 12:38:30,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.14 vs. limit=22.5 2024-08-11 12:38:31,100 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 35 from Vox, 24 fro AS 2024-08-11 12:38:40,363 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 12:39:01,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1102270.0, ans=0.125 2024-08-11 12:39:11,645 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 12:39:11,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1102370.0, ans=0.0 2024-08-11 12:39:23,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1102370.0, ans=0.1 2024-08-11 12:39:25,795 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8800, loss[loss=0.1082, beats_loss=0.01179, ecapa_loss=0.0001458, whisper_loss=0.09496, over 21637.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01132, ecapa_loss=0.0001994, whisper_loss=0.09397, over 3871398.43 frames. ], batch size: 82, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:39:30,502 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 12:39:53,619 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=15.0 2024-08-11 12:40:01,620 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 12:40:03,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1102670.0, ans=0.125 2024-08-11 12:40:22,918 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 12:40:24,322 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 12:40:44,034 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8850, loss[loss=0.1174, beats_loss=0.01064, ecapa_loss=0.000178, whisper_loss=0.105, over 17403.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01133, ecapa_loss=0.0001979, whisper_loss=0.09394, over 3892637.07 frames. ], batch size: 69, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:40:47,121 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 12:40:48,193 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.779e+01 3.220e+01 3.967e+01 6.531e+01, threshold=6.439e+01, percent-clipped=1.0 2024-08-11 12:41:01,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1103070.0, ans=0.125 2024-08-11 12:41:18,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1103170.0, ans=0.2 2024-08-11 12:41:18,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1103170.0, ans=0.0 2024-08-11 12:41:19,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1103170.0, ans=0.2 2024-08-11 12:41:49,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1103370.0, ans=0.125 2024-08-11 12:41:51,232 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-11 12:41:53,431 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.66 vs. limit=15.0 2024-08-11 12:42:06,546 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8900, loss[loss=0.1137, beats_loss=0.009533, ecapa_loss=0.0002074, whisper_loss=0.1021, over 18272.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01133, ecapa_loss=0.0001973, whisper_loss=0.09368, over 3873222.55 frames. ], batch size: 71, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:42:08,055 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 12:42:16,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1103470.0, ans=0.0 2024-08-11 12:42:43,894 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 12:43:10,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1103870.0, ans=0.125 2024-08-11 12:43:21,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1103870.0, ans=0.0 2024-08-11 12:43:24,413 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 8950, loss[loss=0.1405, beats_loss=0.01052, ecapa_loss=0.0001694, whisper_loss=0.1282, over 22173.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01133, ecapa_loss=0.0001972, whisper_loss=0.09388, over 3882657.27 frames. ], batch size: 83, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:43:26,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1103970.0, ans=0.125 2024-08-11 12:43:28,704 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.747e+01 3.145e+01 3.619e+01 5.572e+01, threshold=6.290e+01, percent-clipped=0.0 2024-08-11 12:43:39,726 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 12:44:03,048 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 12:44:03,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1104170.0, ans=0.125 2024-08-11 12:44:04,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1104170.0, ans=0.2 2024-08-11 12:44:31,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1104370.0, ans=0.2 2024-08-11 12:44:38,663 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2024-08-11 12:44:39,432 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9000, loss[loss=0.1155, beats_loss=0.01024, ecapa_loss=0.0002011, whisper_loss=0.1032, over 15892.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01141, ecapa_loss=0.0001971, whisper_loss=0.09344, over 3858744.39 frames. ], batch size: 63, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:44:39,433 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 12:45:15,347 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on ASR_libri: loss=0.2575, beats_loss=0, ecapa_loss=0.0006551, whisper_loss=0.2509, over 922467.00 frames. 2024-08-11 12:45:34,018 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on SV_voxceleb1: loss=0.005315, beats_loss=0, ecapa_loss=0.0005315, whisper_loss=0, over 939242.00 frames. 2024-08-11 12:46:37,827 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.5767, 3.0002, 3.1655, 3.3231], device='cuda:1') 2024-08-11 12:47:19,753 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on AT_audioset: loss=0.02529, beats_loss=0.02529, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 12:47:19,757 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 12:47:24,052 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 12:47:26,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1104470.0, ans=0.0 2024-08-11 12:47:28,458 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 12:47:55,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1104670.0, ans=0.125 2024-08-11 12:47:55,872 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.63 vs. limit=22.5 2024-08-11 12:47:57,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1104670.0, ans=0.04949747468305833 2024-08-11 12:47:59,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1104670.0, ans=0.125 2024-08-11 12:48:09,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1104770.0, ans=0.0 2024-08-11 12:48:09,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1104770.0, ans=0.125 2024-08-11 12:48:16,781 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 12:48:19,858 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 25 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-11 12:48:28,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1104870.0, ans=0.125 2024-08-11 12:48:28,829 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 12:48:36,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9050, loss[loss=0.08737, beats_loss=0.01171, ecapa_loss=0.0002529, whisper_loss=0.07312, over 18351.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01136, ecapa_loss=0.0001983, whisper_loss=0.09359, over 3854335.74 frames. ], batch size: 79, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:48:39,671 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 12:48:41,056 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.751e+01 3.167e+01 3.446e+01 7.186e+01, threshold=6.334e+01, percent-clipped=1.0 2024-08-11 12:48:42,524 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 25 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-11 12:48:42,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1104970.0, ans=0.1 2024-08-11 12:48:51,418 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-11 12:49:30,454 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 12:49:46,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2024-08-11 12:49:47,102 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 12:49:52,723 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.10 vs. limit=10.0 2024-08-11 12:49:53,346 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9100, loss[loss=0.109, beats_loss=0.01209, ecapa_loss=0.000203, whisper_loss=0.09486, over 20376.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01134, ecapa_loss=0.0001997, whisper_loss=0.09324, over 3829418.34 frames. ], batch size: 81, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:50:13,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-11 12:50:22,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1105670.0, ans=0.125 2024-08-11 12:50:30,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2024-08-11 12:50:38,880 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.68 vs. limit=15.0 2024-08-11 12:50:43,968 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 12:50:44,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2024-08-11 12:50:50,152 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-11 12:51:02,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1105870.0, ans=0.2 2024-08-11 12:51:05,352 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 12:51:08,411 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 20 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-11 12:51:10,302 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9150, loss[loss=0.1289, beats_loss=0.01118, ecapa_loss=0.0001781, whisper_loss=0.1159, over 13597.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01136, ecapa_loss=0.0001979, whisper_loss=0.09345, over 3872412.46 frames. ], batch size: 53, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:51:12,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1105970.0, ans=10.0 2024-08-11 12:51:14,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.723e+01 3.003e+01 3.393e+01 4.790e+01, threshold=6.006e+01, percent-clipped=0.0 2024-08-11 12:51:25,729 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 31 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 12:51:27,312 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 27 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-11 12:51:27,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1106070.0, ans=0.125 2024-08-11 12:51:31,733 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 12:51:33,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1106070.0, ans=0.5 2024-08-11 12:51:40,665 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 33 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 12:51:42,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1106170.0, ans=0.0 2024-08-11 12:51:44,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1106170.0, ans=0.2 2024-08-11 12:52:14,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1106370.0, ans=0.125 2024-08-11 12:52:15,600 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 12:52:16,759 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 12:52:23,582 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2024-08-11 12:52:25,790 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9200, loss[loss=0.1004, beats_loss=0.01062, ecapa_loss=0.0001891, whisper_loss=0.0879, over 16304.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01136, ecapa_loss=0.0001976, whisper_loss=0.09308, over 3870924.87 frames. ], batch size: 64, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:52:29,190 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 12:52:29,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1106470.0, ans=0.125 2024-08-11 12:52:35,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1106470.0, ans=0.05 2024-08-11 12:52:37,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1106470.0, ans=0.0 2024-08-11 12:52:54,921 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=15.0 2024-08-11 12:52:58,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1106670.0, ans=0.125 2024-08-11 12:53:02,038 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-11 12:53:13,007 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 12:53:14,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1106770.0, ans=0.0 2024-08-11 12:53:19,759 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 12:53:24,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1106770.0, ans=0.0 2024-08-11 12:53:33,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1106870.0, ans=0.5 2024-08-11 12:53:42,597 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9250, loss[loss=0.1178, beats_loss=0.01108, ecapa_loss=0.0001928, whisper_loss=0.1048, over 20774.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0114, ecapa_loss=0.0001986, whisper_loss=0.09264, over 3917757.54 frames. ], batch size: 81, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:53:47,019 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.771e+01 3.106e+01 3.599e+01 1.159e+02, threshold=6.212e+01, percent-clipped=1.0 2024-08-11 12:53:51,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1106970.0, ans=0.0 2024-08-11 12:53:56,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1107070.0, ans=0.125 2024-08-11 12:53:59,308 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 12:54:12,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1107170.0, ans=10.0 2024-08-11 12:54:25,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1107270.0, ans=0.0 2024-08-11 12:54:44,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1107370.0, ans=0.125 2024-08-11 12:54:52,659 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-11 12:54:57,053 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9300, loss[loss=0.1123, beats_loss=0.01026, ecapa_loss=0.0002107, whisper_loss=0.0999, over 15891.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01128, ecapa_loss=0.0001983, whisper_loss=0.09294, over 3893008.62 frames. ], batch size: 62, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:55:05,695 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-08-11 12:55:10,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1107470.0, ans=0.125 2024-08-11 12:55:11,603 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2024-08-11 12:55:16,420 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-11 12:55:34,640 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2024-08-11 12:55:37,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1107670.0, ans=0.125 2024-08-11 12:55:47,264 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-11 12:55:56,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1107870.0, ans=0.0 2024-08-11 12:56:08,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1107870.0, ans=0.125 2024-08-11 12:56:12,547 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9350, loss[loss=0.1165, beats_loss=0.009951, ecapa_loss=0.0001855, whisper_loss=0.1047, over 16066.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01132, ecapa_loss=0.0001982, whisper_loss=0.0931, over 3903218.20 frames. ], batch size: 63, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:56:14,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1107970.0, ans=0.125 2024-08-11 12:56:17,346 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.772e+01 2.988e+01 3.438e+01 1.215e+02, threshold=5.975e+01, percent-clipped=1.0 2024-08-11 12:56:29,800 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-11 12:56:35,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1108070.0, ans=0.1 2024-08-11 12:56:48,283 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 12:56:48,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1108170.0, ans=0.05 2024-08-11 12:57:06,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1108270.0, ans=0.1 2024-08-11 12:57:10,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1108270.0, ans=0.0 2024-08-11 12:57:13,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1108370.0, ans=0.125 2024-08-11 12:57:23,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1108370.0, ans=0.125 2024-08-11 12:57:24,806 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 9 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 12:57:28,965 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9400, loss[loss=0.09669, beats_loss=0.01068, ecapa_loss=0.0002149, whisper_loss=0.08385, over 22551.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01137, ecapa_loss=0.0001996, whisper_loss=0.09285, over 3904012.04 frames. ], batch size: 89, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:58:04,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1108670.0, ans=0.05 2024-08-11 12:58:09,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1108670.0, ans=0.125 2024-08-11 12:58:29,224 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.22 vs. limit=15.0 2024-08-11 12:58:45,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9450, loss[loss=0.1045, beats_loss=0.01211, ecapa_loss=0.0002305, whisper_loss=0.09013, over 21435.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01133, ecapa_loss=0.0002, whisper_loss=0.0932, over 3876881.82 frames. ], batch size: 91, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:58:50,397 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.669e+01 3.064e+01 3.549e+01 5.554e+01, threshold=6.127e+01, percent-clipped=0.0 2024-08-11 12:58:59,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1109070.0, ans=0.0 2024-08-11 12:59:15,858 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 12:59:22,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1109170.0, ans=0.2 2024-08-11 12:59:24,049 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.75 vs. limit=10.0 2024-08-11 12:59:32,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1109270.0, ans=0.125 2024-08-11 12:59:41,985 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-08-11 12:59:51,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1109370.0, ans=0.0 2024-08-11 12:59:54,532 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-11 12:59:57,787 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 13:00:00,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9500, loss[loss=0.09056, beats_loss=0.01094, ecapa_loss=0.0002675, whisper_loss=0.07695, over 15618.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01137, ecapa_loss=0.0002012, whisper_loss=0.09311, over 3891969.02 frames. ], batch size: 69, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:00:01,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1109470.0, ans=0.125 2024-08-11 13:00:05,357 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-08-11 13:00:08,729 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=12.0 2024-08-11 13:00:24,300 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-11 13:00:24,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1109570.0, ans=0.125 2024-08-11 13:00:28,522 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 13:00:36,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1109670.0, ans=0.0 2024-08-11 13:01:05,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1109870.0, ans=0.125 2024-08-11 13:01:09,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1109870.0, ans=0.0 2024-08-11 13:01:13,946 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9550, loss[loss=0.09056, beats_loss=0.01346, ecapa_loss=0.0001431, whisper_loss=0.07567, over 17294.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01141, ecapa_loss=0.0001993, whisper_loss=0.0922, over 3909033.00 frames. ], batch size: 65, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:01:17,150 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 13:01:18,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.581e+01 3.097e+01 3.550e+01 5.814e+01, threshold=6.195e+01, percent-clipped=0.0 2024-08-11 13:01:20,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1109970.0, ans=0.125 2024-08-11 13:01:36,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.74 vs. limit=5.0 2024-08-11 13:01:39,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1110070.0, ans=0.125 2024-08-11 13:01:44,198 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 13:01:44,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1110170.0, ans=0.0 2024-08-11 13:01:46,061 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 11 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-11 13:02:14,405 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-11 13:02:25,509 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2024-08-11 13:02:27,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1110470.0, ans=0.1 2024-08-11 13:02:28,584 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9600, loss[loss=0.1022, beats_loss=0.01124, ecapa_loss=0.0001726, whisper_loss=0.08919, over 16887.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01134, ecapa_loss=0.0001993, whisper_loss=0.09273, over 3889337.75 frames. ], batch size: 63, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:02:30,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1110470.0, ans=0.125 2024-08-11 13:03:04,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1110670.0, ans=0.2 2024-08-11 13:03:12,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1110770.0, ans=0.2 2024-08-11 13:03:15,081 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.40 vs. limit=22.5 2024-08-11 13:03:16,579 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.58 vs. limit=22.5 2024-08-11 13:03:26,729 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 13:03:39,525 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9650, loss[loss=0.1312, beats_loss=0.008941, ecapa_loss=0.0002553, whisper_loss=0.1197, over 15364.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01128, ecapa_loss=0.0001985, whisper_loss=0.09309, over 3887754.10 frames. ], batch size: 59, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:03:43,495 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.716e+01 3.002e+01 3.574e+01 5.577e+01, threshold=6.004e+01, percent-clipped=0.0 2024-08-11 13:03:44,954 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 13:04:08,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1111170.0, ans=0.0 2024-08-11 13:04:26,514 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-11 13:04:35,142 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 13:04:40,227 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-11 13:04:41,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1111370.0, ans=15.0 2024-08-11 13:04:50,644 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9700, loss[loss=0.1117, beats_loss=0.01209, ecapa_loss=0.0002172, whisper_loss=0.09746, over 21746.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01127, ecapa_loss=0.000201, whisper_loss=0.09212, over 3872616.74 frames. ], batch size: 91, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:05:01,850 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.82 vs. limit=10.0 2024-08-11 13:05:02,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1111470.0, ans=0.0 2024-08-11 13:05:14,604 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 13:05:17,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1111570.0, ans=0.07 2024-08-11 13:05:33,123 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 13:05:41,260 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 13 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 13:05:46,811 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 15 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 13:05:53,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1111870.0, ans=0.125 2024-08-11 13:06:01,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1111870.0, ans=0.025 2024-08-11 13:06:03,913 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9750, loss[loss=0.1183, beats_loss=0.01067, ecapa_loss=0.0002522, whisper_loss=0.1051, over 21581.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01123, ecapa_loss=0.0002015, whisper_loss=0.09281, over 3866133.56 frames. ], batch size: 92, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:06:08,396 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.596e+01 2.916e+01 3.374e+01 5.743e+01, threshold=5.832e+01, percent-clipped=0.0 2024-08-11 13:06:31,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1112070.0, ans=0.125 2024-08-11 13:06:34,368 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.61 vs. limit=22.5 2024-08-11 13:06:38,677 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 13:06:38,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1112170.0, ans=0.125 2024-08-11 13:06:40,749 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 2024-08-11 13:06:45,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1112170.0, ans=15.0 2024-08-11 13:06:46,403 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2024-08-11 13:06:57,057 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 13:07:00,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1112270.0, ans=0.1 2024-08-11 13:07:10,699 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 13:07:17,345 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9800, loss[loss=0.1096, beats_loss=0.01103, ecapa_loss=0.0002124, whisper_loss=0.09646, over 19745.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01126, ecapa_loss=0.0002004, whisper_loss=0.09251, over 3859108.55 frames. ], batch size: 79, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:07:23,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1112470.0, ans=0.1 2024-08-11 13:07:37,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1112570.0, ans=10.0 2024-08-11 13:07:44,635 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 13:08:00,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1112770.0, ans=0.125 2024-08-11 13:08:10,955 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 13:08:19,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1112870.0, ans=0.035 2024-08-11 13:08:20,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1112870.0, ans=0.0 2024-08-11 13:08:32,892 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9850, loss[loss=0.1191, beats_loss=0.009685, ecapa_loss=0.0002393, whisper_loss=0.107, over 18834.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01133, ecapa_loss=0.0002004, whisper_loss=0.09278, over 3853267.38 frames. ], batch size: 74, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:08:37,495 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.640e+01 2.920e+01 3.284e+01 5.372e+01, threshold=5.839e+01, percent-clipped=0.0 2024-08-11 13:08:52,867 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2024-08-11 13:08:54,792 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 13:09:10,039 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.35 vs. limit=10.0 2024-08-11 13:09:17,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1113170.0, ans=0.125 2024-08-11 13:09:22,979 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 13:09:24,403 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 13:09:35,449 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 13:09:43,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1113370.0, ans=0.0 2024-08-11 13:09:50,660 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9900, loss[loss=0.1001, beats_loss=0.009686, ecapa_loss=0.0002339, whisper_loss=0.08809, over 16234.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01136, ecapa_loss=0.0001989, whisper_loss=0.09274, over 3885104.36 frames. ], batch size: 67, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:09:57,045 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 30 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 13:09:57,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1113470.0, ans=0.125 2024-08-11 13:09:59,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1113470.0, ans=0.125 2024-08-11 13:10:01,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1113470.0, ans=0.1 2024-08-11 13:10:07,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1113570.0, ans=0.125 2024-08-11 13:10:08,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1113570.0, ans=0.2 2024-08-11 13:10:30,506 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.90 vs. limit=12.0 2024-08-11 13:10:34,909 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 13:10:39,458 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=22.5 2024-08-11 13:10:57,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1113770.0, ans=0.1 2024-08-11 13:11:10,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1113870.0, ans=0.125 2024-08-11 13:11:17,263 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.396e-01 2024-08-11 13:11:18,566 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 9950, loss[loss=0.1275, beats_loss=0.008349, ecapa_loss=0.000235, whisper_loss=0.1168, over 19200.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01134, ecapa_loss=0.0001979, whisper_loss=0.09299, over 3882674.71 frames. ], batch size: 73, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:11:24,428 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.684e+01 2.921e+01 3.407e+01 1.322e+02, threshold=5.842e+01, percent-clipped=4.0 2024-08-11 13:11:46,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1114070.0, ans=0.2 2024-08-11 13:11:47,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.26 vs. limit=10.0 2024-08-11 13:12:18,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1114270.0, ans=0.125 2024-08-11 13:12:23,530 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 13:12:28,855 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 13:12:31,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1114370.0, ans=0.125 2024-08-11 13:12:36,175 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 13:12:38,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1114370.0, ans=0.0 2024-08-11 13:12:48,465 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 30 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 13:12:50,191 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10000, loss[loss=0.1286, beats_loss=0.009181, ecapa_loss=0.0002798, whisper_loss=0.1166, over 18964.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01137, ecapa_loss=0.0002001, whisper_loss=0.0928, over 3883883.06 frames. ], batch size: 76, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:12:55,983 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 13:13:13,743 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-11 13:13:43,681 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 13:13:49,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1114770.0, ans=0.1 2024-08-11 13:14:20,575 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10050, loss[loss=0.121, beats_loss=0.01106, ecapa_loss=0.0002002, whisper_loss=0.108, over 22254.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01131, ecapa_loss=0.000199, whisper_loss=0.09302, over 3866016.74 frames. ], batch size: 89, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:14:26,572 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.703e+01 2.998e+01 3.429e+01 6.033e+01, threshold=5.996e+01, percent-clipped=1.0 2024-08-11 13:14:30,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1114970.0, ans=0.025 2024-08-11 13:14:39,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1115070.0, ans=0.2 2024-08-11 13:14:45,976 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 31 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 13:14:57,173 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 13:15:08,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1115170.0, ans=0.0 2024-08-11 13:15:15,413 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 13:15:27,722 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 13:15:53,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1115370.0, ans=0.0 2024-08-11 13:15:57,285 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10100, loss[loss=0.1131, beats_loss=0.01284, ecapa_loss=0.000192, whisper_loss=0.09829, over 14298.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01137, ecapa_loss=0.0001989, whisper_loss=0.09283, over 3894006.82 frames. ], batch size: 54, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:16:37,917 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 23 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 13:16:38,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1115570.0, ans=0.0 2024-08-11 13:17:03,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1115770.0, ans=0.125 2024-08-11 13:17:26,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1115870.0, ans=0.125 2024-08-11 13:17:34,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1115870.0, ans=0.125 2024-08-11 13:17:44,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1115970.0, ans=0.2 2024-08-11 13:17:45,433 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10150, loss[loss=0.1064, beats_loss=0.01238, ecapa_loss=0.0001518, whisper_loss=0.09249, over 17902.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01134, ecapa_loss=0.0002013, whisper_loss=0.09237, over 3880434.58 frames. ], batch size: 66, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:17:49,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.757e+01 3.072e+01 3.612e+01 1.119e+02, threshold=6.144e+01, percent-clipped=1.0 2024-08-11 13:18:03,459 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.76 vs. limit=15.0 2024-08-11 13:18:08,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1116070.0, ans=0.125 2024-08-11 13:18:11,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1116070.0, ans=0.5 2024-08-11 13:18:40,744 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-11 13:19:00,723 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10200, loss[loss=0.1079, beats_loss=0.01226, ecapa_loss=0.0001984, whisper_loss=0.09361, over 21470.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0113, ecapa_loss=0.000202, whisper_loss=0.09278, over 3894227.69 frames. ], batch size: 85, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:19:11,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=1116470.0, ans=0.1 2024-08-11 13:19:24,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1116570.0, ans=0.125 2024-08-11 13:19:33,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1116670.0, ans=0.1 2024-08-11 13:19:34,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1116670.0, ans=0.07 2024-08-11 13:19:34,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1116670.0, ans=0.0 2024-08-11 13:19:35,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1116670.0, ans=0.125 2024-08-11 13:19:38,616 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 13:19:48,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1116770.0, ans=0.125 2024-08-11 13:19:48,902 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=12.0 2024-08-11 13:20:14,895 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 13:20:17,197 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=12.0 2024-08-11 13:20:17,922 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 13:20:19,133 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10250, loss[loss=0.08876, beats_loss=0.01305, ecapa_loss=0.0001866, whisper_loss=0.07384, over 21598.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01128, ecapa_loss=0.0002014, whisper_loss=0.09311, over 3913782.88 frames. ], batch size: 87, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:20:23,862 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.664e+01 3.001e+01 3.567e+01 5.136e+01, threshold=6.003e+01, percent-clipped=0.0 2024-08-11 13:20:28,578 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-11 13:20:46,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1117070.0, ans=0.125 2024-08-11 13:20:53,647 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.03 vs. limit=6.0 2024-08-11 13:21:04,952 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.76 vs. limit=12.0 2024-08-11 13:21:25,297 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2024-08-11 13:21:40,720 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10300, loss[loss=0.1008, beats_loss=0.0108, ecapa_loss=0.0002128, whisper_loss=0.08783, over 16393.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01135, ecapa_loss=0.0002021, whisper_loss=0.09278, over 3948962.90 frames. ], batch size: 66, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:21:57,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1117570.0, ans=0.1 2024-08-11 13:22:32,504 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 13:22:35,556 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 13:22:35,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1117770.0, ans=0.125 2024-08-11 13:22:36,023 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-11 13:22:55,246 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-08-11 13:22:56,805 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2024-08-11 13:22:58,282 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.561e-01 2024-08-11 13:22:58,514 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2024-08-11 13:22:59,333 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 13:23:01,808 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10350, loss[loss=0.1059, beats_loss=0.01082, ecapa_loss=0.0001944, whisper_loss=0.09317, over 20920.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01138, ecapa_loss=0.0002017, whisper_loss=0.0926, over 3948780.16 frames. ], batch size: 84, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:23:04,771 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 13:23:06,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.796e+01 3.108e+01 3.786e+01 6.316e+01, threshold=6.215e+01, percent-clipped=1.0 2024-08-11 13:23:27,680 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.05 vs. limit=15.0 2024-08-11 13:23:37,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1118170.0, ans=0.1 2024-08-11 13:24:00,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1118270.0, ans=0.125 2024-08-11 13:24:18,024 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10400, loss[loss=0.1105, beats_loss=0.01255, ecapa_loss=0.0001909, whisper_loss=0.09605, over 23078.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01139, ecapa_loss=0.0001988, whisper_loss=0.09304, over 3941580.26 frames. ], batch size: 94, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:24:33,589 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 13:24:33,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1118570.0, ans=0.1 2024-08-11 13:24:33,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1118570.0, ans=0.125 2024-08-11 13:24:58,832 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2024-08-11 13:25:21,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1118870.0, ans=0.5 2024-08-11 13:25:22,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1118870.0, ans=0.0 2024-08-11 13:25:35,222 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10450, loss[loss=0.1265, beats_loss=0.009186, ecapa_loss=0.0001826, whisper_loss=0.1155, over 22023.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0114, ecapa_loss=0.0001978, whisper_loss=0.09234, over 3928171.70 frames. ], batch size: 84, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:25:39,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.711e+01 3.019e+01 3.517e+01 4.993e+01, threshold=6.039e+01, percent-clipped=0.0 2024-08-11 13:25:39,826 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-11 13:25:52,082 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 13:25:54,586 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.10 vs. limit=6.0 2024-08-11 13:26:04,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1119170.0, ans=0.0 2024-08-11 13:26:21,665 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 13:26:32,781 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 13:26:40,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1119370.0, ans=0.1 2024-08-11 13:26:46,774 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-11 13:26:53,841 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10500, loss[loss=0.08895, beats_loss=0.01307, ecapa_loss=0.0002209, whisper_loss=0.07367, over 18162.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01139, ecapa_loss=0.0001986, whisper_loss=0.0925, over 3906215.30 frames. ], batch size: 78, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:26:54,013 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 14 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 13:26:55,483 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 13:27:02,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1119470.0, ans=0.1 2024-08-11 13:27:03,607 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.50 vs. limit=10.0 2024-08-11 13:27:12,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1119570.0, ans=0.025 2024-08-11 13:27:30,918 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 13:27:32,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1119670.0, ans=0.0 2024-08-11 13:27:33,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1119670.0, ans=0.0 2024-08-11 13:27:36,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1119670.0, ans=0.125 2024-08-11 13:27:41,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1119770.0, ans=0.2 2024-08-11 13:27:59,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.29 vs. limit=22.5 2024-08-11 13:28:11,263 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10550, loss[loss=0.102, beats_loss=0.01254, ecapa_loss=0.0001828, whisper_loss=0.08767, over 19645.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0114, ecapa_loss=0.0001991, whisper_loss=0.09203, over 3866329.86 frames. ], batch size: 77, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:28:11,893 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2024-08-11 13:28:13,224 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.518e-01 2024-08-11 13:28:17,808 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.650e+01 3.072e+01 3.667e+01 9.491e+01, threshold=6.144e+01, percent-clipped=1.0 2024-08-11 13:28:26,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1120070.0, ans=0.0 2024-08-11 13:28:28,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1120070.0, ans=0.0 2024-08-11 13:28:45,977 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 13:29:15,159 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.66 vs. limit=22.5 2024-08-11 13:29:18,121 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 13:29:19,650 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-11 13:29:33,341 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10600, loss[loss=0.09891, beats_loss=0.01411, ecapa_loss=0.0001488, whisper_loss=0.08332, over 18556.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01143, ecapa_loss=0.0001986, whisper_loss=0.0918, over 3857009.19 frames. ], batch size: 73, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:29:34,141 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.15 vs. limit=15.0 2024-08-11 13:29:46,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1120470.0, ans=0.0 2024-08-11 13:29:47,593 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 13:30:00,458 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 13:30:21,140 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 13:30:25,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1120770.0, ans=0.125 2024-08-11 13:30:50,464 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10650, loss[loss=0.1161, beats_loss=0.009417, ecapa_loss=0.0001916, whisper_loss=0.1048, over 16972.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01138, ecapa_loss=0.0001977, whisper_loss=0.09266, over 3854266.05 frames. ], batch size: 62, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:30:52,240 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 13:30:57,377 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.737e+01 3.110e+01 3.500e+01 6.521e+01, threshold=6.221e+01, percent-clipped=1.0 2024-08-11 13:31:03,639 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 30 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 13:31:12,459 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-11 13:31:16,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1121070.0, ans=0.0 2024-08-11 13:31:20,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1121170.0, ans=0.125 2024-08-11 13:31:22,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1121170.0, ans=0.1 2024-08-11 13:31:48,490 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-11 13:32:10,113 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10700, loss[loss=0.1103, beats_loss=0.01345, ecapa_loss=0.0001511, whisper_loss=0.0953, over 20445.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01153, ecapa_loss=0.0001963, whisper_loss=0.09189, over 3860825.18 frames. ], batch size: 80, lr: 7.77e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:32:10,256 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 13:32:35,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1121570.0, ans=0.1 2024-08-11 13:32:44,541 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.49 vs. limit=22.5 2024-08-11 13:32:53,690 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2024-08-11 13:33:01,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1121770.0, ans=0.1 2024-08-11 13:33:31,034 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10750, loss[loss=0.0994, beats_loss=0.01096, ecapa_loss=0.000218, whisper_loss=0.08626, over 17260.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01149, ecapa_loss=0.000198, whisper_loss=0.09184, over 3857061.19 frames. ], batch size: 71, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:33:38,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.776e+01 3.070e+01 3.397e+01 5.449e+01, threshold=6.140e+01, percent-clipped=0.0 2024-08-11 13:33:41,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1121970.0, ans=0.1 2024-08-11 13:33:48,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1122070.0, ans=0.2 2024-08-11 13:33:55,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1122070.0, ans=0.125 2024-08-11 13:34:19,602 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:34:22,163 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 13:34:32,807 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 13:34:39,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1122370.0, ans=0.125 2024-08-11 13:34:49,809 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10800, loss[loss=0.09864, beats_loss=0.0126, ecapa_loss=0.0001998, whisper_loss=0.08405, over 19937.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01147, ecapa_loss=0.0001974, whisper_loss=0.09301, over 3913031.54 frames. ], batch size: 84, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:34:50,027 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 13 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 13:35:21,171 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 13:35:23,983 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 13:35:29,153 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 13:35:36,487 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 13:35:38,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1122770.0, ans=0.0 2024-08-11 13:35:47,031 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 13:35:53,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1122870.0, ans=0.05 2024-08-11 13:35:57,104 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 13:36:01,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1122870.0, ans=0.2 2024-08-11 13:36:07,444 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10850, loss[loss=0.09165, beats_loss=0.01335, ecapa_loss=0.0001644, whisper_loss=0.07666, over 17487.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01149, ecapa_loss=0.0001984, whisper_loss=0.09335, over 3921674.07 frames. ], batch size: 70, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:36:10,034 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-11 13:36:11,467 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.15 vs. limit=15.0 2024-08-11 13:36:15,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.852e+01 3.448e+01 4.280e+01 7.389e+01, threshold=6.896e+01, percent-clipped=2.0 2024-08-11 13:36:17,769 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2024-08-11 13:36:30,541 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 13:36:33,804 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 13:36:56,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1123270.0, ans=0.2 2024-08-11 13:37:08,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1123270.0, ans=0.09899494936611666 2024-08-11 13:37:25,644 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:37:29,101 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10900, loss[loss=0.1326, beats_loss=0.009406, ecapa_loss=0.0001962, whisper_loss=0.1213, over 24639.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.0114, ecapa_loss=0.0002, whisper_loss=0.09413, over 3952124.32 frames. ], batch size: 93, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:37:41,212 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 13:37:59,985 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:38:03,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1123670.0, ans=0.1 2024-08-11 13:38:13,990 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.25 vs. limit=10.0 2024-08-11 13:38:32,042 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 13:38:45,888 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 10950, loss[loss=0.07405, beats_loss=0.01226, ecapa_loss=0.000208, whisper_loss=0.05972, over 14483.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01134, ecapa_loss=0.0001992, whisper_loss=0.09383, over 3965388.88 frames. ], batch size: 58, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:38:48,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1123970.0, ans=0.0 2024-08-11 13:38:50,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1123970.0, ans=0.125 2024-08-11 13:38:52,479 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 13:38:53,457 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 2.774e+01 3.085e+01 3.666e+01 6.229e+01, threshold=6.171e+01, percent-clipped=0.0 2024-08-11 13:39:29,547 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 13:39:29,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1124170.0, ans=0.05 2024-08-11 13:40:03,050 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11000, loss[loss=0.1199, beats_loss=0.008278, ecapa_loss=0.0001942, whisper_loss=0.1097, over 18898.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01119, ecapa_loss=0.0002009, whisper_loss=0.09465, over 3954661.02 frames. ], batch size: 72, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:40:11,069 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 13:40:17,847 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-11 13:40:39,038 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2024-08-11 13:40:39,620 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 13:40:59,028 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-11 13:41:02,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1124770.0, ans=0.125 2024-08-11 13:41:05,899 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-08-11 13:41:22,317 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11050, loss[loss=0.1094, beats_loss=0.009599, ecapa_loss=0.0002177, whisper_loss=0.09758, over 21659.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01119, ecapa_loss=0.0002018, whisper_loss=0.09448, over 3949460.86 frames. ], batch size: 88, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:41:25,527 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 17 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-11 13:41:29,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.704e+01 3.049e+01 3.665e+01 6.034e+01, threshold=6.098e+01, percent-clipped=0.0 2024-08-11 13:41:58,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1125170.0, ans=0.125 2024-08-11 13:42:20,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1125270.0, ans=0.0 2024-08-11 13:42:23,538 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.68 vs. limit=10.0 2024-08-11 13:42:31,995 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-11 13:42:39,107 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11100, loss[loss=0.106, beats_loss=0.01127, ecapa_loss=0.000214, whisper_loss=0.09257, over 17471.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01119, ecapa_loss=0.0002014, whisper_loss=0.09403, over 3918478.22 frames. ], batch size: 72, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:43:03,775 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:43:16,490 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 13:43:27,572 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-08-11 13:43:33,341 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 13:43:38,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1125770.0, ans=0.125 2024-08-11 13:43:38,782 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.97 vs. limit=15.0 2024-08-11 13:43:46,020 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.108e-01 2024-08-11 13:43:49,828 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 13:43:53,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1125870.0, ans=0.1 2024-08-11 13:44:01,665 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11150, loss[loss=0.0817, beats_loss=0.01289, ecapa_loss=0.0002159, whisper_loss=0.06665, over 19024.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01115, ecapa_loss=0.0001996, whisper_loss=0.09407, over 3912134.00 frames. ], batch size: 81, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:44:06,399 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.38 vs. limit=12.0 2024-08-11 13:44:08,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=1125970.0, ans=0.2 2024-08-11 13:44:09,627 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.627e+01 3.035e+01 3.415e+01 6.543e+01, threshold=6.070e+01, percent-clipped=1.0 2024-08-11 13:44:33,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1126170.0, ans=0.0 2024-08-11 13:44:54,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1126270.0, ans=0.125 2024-08-11 13:45:11,435 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 13:45:15,893 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 13:45:18,315 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11200, loss[loss=0.09954, beats_loss=0.01279, ecapa_loss=0.0002056, whisper_loss=0.0847, over 21396.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01114, ecapa_loss=0.0001994, whisper_loss=0.09468, over 3929363.56 frames. ], batch size: 89, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:45:19,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-11 13:45:20,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1126470.0, ans=10.0 2024-08-11 13:45:37,470 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-11 13:45:37,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1126570.0, ans=0.025 2024-08-11 13:45:38,272 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.22 vs. limit=15.0 2024-08-11 13:45:43,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1126570.0, ans=0.125 2024-08-11 13:45:43,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1126570.0, ans=0.125 2024-08-11 13:45:43,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=22.5 2024-08-11 13:45:55,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1126670.0, ans=0.07 2024-08-11 13:45:57,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1126670.0, ans=0.125 2024-08-11 13:46:03,589 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.34 vs. limit=22.5 2024-08-11 13:46:05,975 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 30 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 13:46:06,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1126670.0, ans=0.125 2024-08-11 13:46:18,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1126770.0, ans=0.0 2024-08-11 13:46:20,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1126770.0, ans=0.1 2024-08-11 13:46:30,614 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 13:46:32,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1126870.0, ans=0.125 2024-08-11 13:46:35,846 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2024-08-11 13:46:42,749 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11250, loss[loss=0.1054, beats_loss=0.01086, ecapa_loss=0.0002, whisper_loss=0.09251, over 21711.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01123, ecapa_loss=0.0001988, whisper_loss=0.09463, over 3905325.41 frames. ], batch size: 93, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:46:43,615 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2024-08-11 13:46:47,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1126970.0, ans=0.0 2024-08-11 13:46:52,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.684e+01 2.944e+01 3.546e+01 6.829e+01, threshold=5.887e+01, percent-clipped=2.0 2024-08-11 13:46:56,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=1126970.0, ans=15.0 2024-08-11 13:47:16,403 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 13:47:16,736 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:47:24,887 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 13:47:48,817 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 13:48:05,912 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11300, loss[loss=0.101, beats_loss=0.01416, ecapa_loss=0.0002212, whisper_loss=0.0846, over 22566.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01134, ecapa_loss=0.0001983, whisper_loss=0.09383, over 3911164.11 frames. ], batch size: 95, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:48:39,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1127670.0, ans=0.2 2024-08-11 13:48:50,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1127670.0, ans=0.125 2024-08-11 13:48:59,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1127770.0, ans=0.2 2024-08-11 13:49:08,352 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 13:49:23,966 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 13:49:25,196 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11350, loss[loss=0.1258, beats_loss=0.0102, ecapa_loss=0.0002191, whisper_loss=0.1134, over 21148.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01126, ecapa_loss=0.0001978, whisper_loss=0.09492, over 3922345.11 frames. ], batch size: 86, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:49:33,909 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.648e+01 3.083e+01 3.583e+01 5.645e+01, threshold=6.165e+01, percent-clipped=0.0 2024-08-11 13:49:41,850 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-11 13:49:51,384 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 16 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 13:50:21,998 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 18 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 13:50:22,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1128270.0, ans=0.125 2024-08-11 13:50:33,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1128370.0, ans=0.5 2024-08-11 13:50:40,845 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-11 13:50:44,165 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11400, loss[loss=0.1016, beats_loss=0.01304, ecapa_loss=0.0002483, whisper_loss=0.08603, over 22378.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01116, ecapa_loss=0.0001986, whisper_loss=0.09544, over 3889208.92 frames. ], batch size: 93, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:50:44,832 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.28 vs. limit=22.5 2024-08-11 13:50:45,734 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-11 13:50:46,169 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2024-08-11 13:51:02,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1128570.0, ans=0.1 2024-08-11 13:51:15,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1128670.0, ans=0.125 2024-08-11 13:51:22,197 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.46 vs. limit=10.0 2024-08-11 13:51:26,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1128670.0, ans=0.0 2024-08-11 13:51:41,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1128770.0, ans=0.1 2024-08-11 13:51:45,494 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 13:51:56,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=22.5 2024-08-11 13:51:59,176 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11450, loss[loss=0.1151, beats_loss=0.01108, ecapa_loss=0.0002198, whisper_loss=0.1018, over 21459.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01123, ecapa_loss=0.0001982, whisper_loss=0.09465, over 3890974.86 frames. ], batch size: 88, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:52:07,519 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.738e+01 3.140e+01 3.413e+01 5.128e+01, threshold=6.280e+01, percent-clipped=0.0 2024-08-11 13:53:08,002 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 13:53:09,649 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 13:53:17,915 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11500, loss[loss=0.09511, beats_loss=0.01178, ecapa_loss=0.0002338, whisper_loss=0.08099, over 21124.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01129, ecapa_loss=0.0001971, whisper_loss=0.09361, over 3872432.90 frames. ], batch size: 88, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:53:36,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1129570.0, ans=0.025 2024-08-11 13:53:37,367 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 13:54:04,317 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 13:54:11,475 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 13:54:16,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1129770.0, ans=0.0 2024-08-11 13:54:16,628 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-11 13:54:22,551 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 18 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 13:54:26,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1129870.0, ans=0.125 2024-08-11 13:54:33,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1129870.0, ans=0.125 2024-08-11 13:54:36,008 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11550, loss[loss=0.09191, beats_loss=0.01347, ecapa_loss=0.0001916, whisper_loss=0.07653, over 21919.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01129, ecapa_loss=0.0001965, whisper_loss=0.09363, over 3876312.28 frames. ], batch size: 92, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:54:40,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1129970.0, ans=0.1 2024-08-11 13:54:45,194 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.847e+01 3.236e+01 3.830e+01 5.730e+01, threshold=6.473e+01, percent-clipped=0.0 2024-08-11 13:54:57,582 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 13:55:59,492 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11600, loss[loss=0.11, beats_loss=0.01169, ecapa_loss=0.0002145, whisper_loss=0.09615, over 22262.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01115, ecapa_loss=0.0001993, whisper_loss=0.09373, over 3863011.93 frames. ], batch size: 92, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:56:01,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1130470.0, ans=0.0 2024-08-11 13:56:28,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1130570.0, ans=0.05 2024-08-11 13:56:33,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1130670.0, ans=0.2 2024-08-11 13:57:17,997 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11650, loss[loss=0.1022, beats_loss=0.01342, ecapa_loss=0.0001703, whisper_loss=0.08705, over 22071.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01117, ecapa_loss=0.0001995, whisper_loss=0.09385, over 3900421.65 frames. ], batch size: 90, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:57:20,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1130970.0, ans=0.125 2024-08-11 13:57:22,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1130970.0, ans=0.125 2024-08-11 13:57:22,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1130970.0, ans=0.125 2024-08-11 13:57:26,698 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.647e+01 2.966e+01 3.476e+01 5.523e+01, threshold=5.933e+01, percent-clipped=0.0 2024-08-11 13:57:35,938 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 13:57:43,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1131070.0, ans=15.0 2024-08-11 13:57:44,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1131070.0, ans=0.2 2024-08-11 13:57:51,832 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=22.5 2024-08-11 13:57:59,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1131170.0, ans=0.1 2024-08-11 13:58:05,141 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.349e+02 2024-08-11 13:58:14,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1131270.0, ans=0.125 2024-08-11 13:58:24,723 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 13:58:35,058 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11700, loss[loss=0.09084, beats_loss=0.01229, ecapa_loss=0.0002345, whisper_loss=0.0762, over 20345.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01119, ecapa_loss=0.0001991, whisper_loss=0.09416, over 3905699.15 frames. ], batch size: 84, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:58:51,157 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 13:58:51,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1131570.0, ans=0.0 2024-08-11 13:59:10,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1131670.0, ans=0.1 2024-08-11 13:59:11,171 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 13:59:39,127 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-11 13:59:43,586 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 24 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-11 13:59:45,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1131870.0, ans=0.1 2024-08-11 13:59:52,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1131870.0, ans=0.1 2024-08-11 13:59:55,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1131970.0, ans=0.2 2024-08-11 13:59:56,816 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11750, loss[loss=0.1293, beats_loss=0.009781, ecapa_loss=0.0002017, whisper_loss=0.1175, over 20942.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01122, ecapa_loss=0.0001984, whisper_loss=0.09449, over 3918373.99 frames. ], batch size: 81, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:59:58,467 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 14:00:04,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.320e+01 2.835e+01 3.323e+01 3.805e+01 1.328e+02, threshold=6.647e+01, percent-clipped=1.0 2024-08-11 14:00:31,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1132170.0, ans=0.0 2024-08-11 14:01:12,718 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-11 14:01:15,685 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11800, loss[loss=0.1077, beats_loss=0.01083, ecapa_loss=0.0001941, whisper_loss=0.09489, over 14311.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01121, ecapa_loss=0.0001976, whisper_loss=0.09471, over 3921989.81 frames. ], batch size: 57, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:01:18,315 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 14:01:30,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1132570.0, ans=0.05 2024-08-11 14:01:41,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1132570.0, ans=0.07 2024-08-11 14:02:02,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1132770.0, ans=0.125 2024-08-11 14:02:16,039 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 27 from Vox, 20 fro AS 2024-08-11 14:02:30,199 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11850, loss[loss=0.1336, beats_loss=0.0104, ecapa_loss=0.0001902, whisper_loss=0.1213, over 22962.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01112, ecapa_loss=0.0001966, whisper_loss=0.09576, over 3949160.16 frames. ], batch size: 90, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:02:33,662 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=15.0 2024-08-11 14:02:38,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.696e+01 3.020e+01 3.645e+01 5.662e+01, threshold=6.041e+01, percent-clipped=0.0 2024-08-11 14:02:42,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1132970.0, ans=0.0 2024-08-11 14:03:11,689 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 14:03:20,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1133270.0, ans=0.1 2024-08-11 14:03:40,168 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 17 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 14:03:45,050 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-11 14:03:46,717 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11900, loss[loss=0.116, beats_loss=0.008687, ecapa_loss=0.0002241, whisper_loss=0.105, over 22086.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01109, ecapa_loss=0.0001985, whisper_loss=0.09577, over 3930290.15 frames. ], batch size: 88, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:03:51,979 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-11 14:04:02,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1133570.0, ans=0.125 2024-08-11 14:04:30,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1133670.0, ans=0.0 2024-08-11 14:04:34,587 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-11 14:04:34,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1133770.0, ans=0.2 2024-08-11 14:04:37,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1133770.0, ans=0.125 2024-08-11 14:04:46,798 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 14:04:57,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1133870.0, ans=0.04949747468305833 2024-08-11 14:04:58,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1133870.0, ans=0.0 2024-08-11 14:05:04,731 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 11950, loss[loss=0.08597, beats_loss=0.01438, ecapa_loss=0.0001748, whisper_loss=0.06984, over 13748.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01118, ecapa_loss=0.0001983, whisper_loss=0.09434, over 3913256.28 frames. ], batch size: 55, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:05:12,834 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.574e+01 2.891e+01 3.292e+01 6.091e+01, threshold=5.783e+01, percent-clipped=1.0 2024-08-11 14:05:27,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1134070.0, ans=0.125 2024-08-11 14:05:35,708 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 14:05:49,568 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-11 14:06:07,178 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:06:11,158 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 38 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-11 14:06:11,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1134370.0, ans=0.125 2024-08-11 14:06:14,701 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 14:06:24,371 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12000, loss[loss=0.101, beats_loss=0.01229, ecapa_loss=0.0001905, whisper_loss=0.08677, over 17301.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01109, ecapa_loss=0.0001976, whisper_loss=0.09502, over 3924427.80 frames. ], batch size: 69, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:06:24,371 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 14:07:03,286 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on ASR_libri: loss=0.2578, beats_loss=0, ecapa_loss=0.0006428, whisper_loss=0.2514, over 922467.00 frames. 2024-08-11 14:07:22,295 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on SV_voxceleb1: loss=0.005208, beats_loss=0, ecapa_loss=0.0005208, whisper_loss=0, over 939242.00 frames. 2024-08-11 14:09:12,731 INFO [train_multi_KD3.py:1149] (1/4) Epoch 8, validation on AT_audioset: loss=0.02509, beats_loss=0.02509, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 14:09:12,734 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 14:09:12,937 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 14:09:13,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1134470.0, ans=0.0 2024-08-11 14:09:24,008 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 14:09:30,402 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.18 vs. limit=22.5 2024-08-11 14:09:35,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1134570.0, ans=0.1 2024-08-11 14:10:01,042 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 14:10:09,584 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2024-08-11 14:10:10,166 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 14:10:17,281 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 14:10:17,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=15.0 2024-08-11 14:10:20,216 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 14:10:23,310 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-11 14:10:26,314 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12050, loss[loss=0.1046, beats_loss=0.01164, ecapa_loss=0.000172, whisper_loss=0.09128, over 16139.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01115, ecapa_loss=0.000198, whisper_loss=0.09415, over 3896329.36 frames. ], batch size: 61, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:10:28,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1134970.0, ans=0.0 2024-08-11 14:10:34,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.739e+01 2.961e+01 3.556e+01 5.317e+01, threshold=5.922e+01, percent-clipped=0.0 2024-08-11 14:10:35,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1134970.0, ans=0.1 2024-08-11 14:10:37,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1134970.0, ans=0.015 2024-08-11 14:10:40,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1134970.0, ans=0.2 2024-08-11 14:10:43,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1135070.0, ans=0.0 2024-08-11 14:10:52,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1135070.0, ans=0.2 2024-08-11 14:10:52,832 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.06 vs. limit=15.0 2024-08-11 14:11:13,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1135270.0, ans=0.0 2024-08-11 14:11:20,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1135270.0, ans=0.0 2024-08-11 14:11:34,111 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.06 vs. limit=10.0 2024-08-11 14:11:42,336 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12100, loss[loss=0.1061, beats_loss=0.01123, ecapa_loss=0.0002134, whisper_loss=0.09274, over 22514.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01115, ecapa_loss=0.000199, whisper_loss=0.09351, over 3867141.33 frames. ], batch size: 95, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:11:44,054 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 31 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 14:12:03,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1135570.0, ans=0.125 2024-08-11 14:12:05,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1135570.0, ans=0.0 2024-08-11 14:12:12,385 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 14:12:19,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1135670.0, ans=0.0 2024-08-11 14:12:34,748 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.365e-01 2024-08-11 14:12:52,142 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12150, loss[loss=0.09802, beats_loss=0.01004, ecapa_loss=0.0002213, whisper_loss=0.08577, over 14995.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01117, ecapa_loss=0.0001989, whisper_loss=0.09343, over 3857541.60 frames. ], batch size: 60, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:12:59,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.505e+01 2.843e+01 3.166e+01 1.229e+02, threshold=5.686e+01, percent-clipped=1.0 2024-08-11 14:13:11,362 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.74 vs. limit=22.5 2024-08-11 14:13:17,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1136070.0, ans=0.125 2024-08-11 14:13:20,964 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 14:13:25,351 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 14:13:34,218 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-11 14:13:39,514 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 14:13:49,717 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.13 vs. limit=6.0 2024-08-11 14:13:52,052 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 14:14:00,463 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12200, loss[loss=0.09685, beats_loss=0.01201, ecapa_loss=0.000206, whisper_loss=0.08278, over 16483.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01118, ecapa_loss=0.0001989, whisper_loss=0.09331, over 3848198.74 frames. ], batch size: 66, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:14:06,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1136470.0, ans=0.125 2024-08-11 14:14:11,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2024-08-11 14:14:31,712 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.05 vs. limit=22.5 2024-08-11 14:14:51,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1136770.0, ans=0.025 2024-08-11 14:14:56,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1136870.0, ans=0.2 2024-08-11 14:15:09,564 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12250, loss[loss=0.1152, beats_loss=0.01286, ecapa_loss=0.000208, whisper_loss=0.1003, over 21553.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.0112, ecapa_loss=0.0001993, whisper_loss=0.09386, over 3863003.90 frames. ], batch size: 89, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:15:16,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.710e+01 3.098e+01 3.529e+01 5.582e+01, threshold=6.197e+01, percent-clipped=0.0 2024-08-11 14:15:26,346 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-11 14:15:27,729 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-11 14:15:29,144 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 14:15:39,201 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-08-11 14:15:41,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1137170.0, ans=0.015 2024-08-11 14:16:09,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1137370.0, ans=0.125 2024-08-11 14:16:12,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.87 vs. limit=10.0 2024-08-11 14:16:19,030 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12300, loss[loss=0.1041, beats_loss=0.01024, ecapa_loss=0.0002301, whisper_loss=0.09159, over 18516.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01115, ecapa_loss=0.0002016, whisper_loss=0.09364, over 3870535.40 frames. ], batch size: 74, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:16:22,908 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 14:16:45,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1137670.0, ans=0.0 2024-08-11 14:16:57,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1137670.0, ans=0.0 2024-08-11 14:17:01,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1137770.0, ans=0.0 2024-08-11 14:17:28,826 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12350, loss[loss=0.112, beats_loss=0.01093, ecapa_loss=0.0002002, whisper_loss=0.09902, over 17119.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01115, ecapa_loss=0.0002021, whisper_loss=0.09389, over 3880547.86 frames. ], batch size: 65, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:17:36,218 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.078e+01 2.771e+01 3.079e+01 3.408e+01 5.279e+01, threshold=6.158e+01, percent-clipped=0.0 2024-08-11 14:17:44,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1138070.0, ans=0.125 2024-08-11 14:17:57,258 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.53 vs. limit=15.0 2024-08-11 14:18:01,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1138170.0, ans=0.2 2024-08-11 14:18:03,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1138170.0, ans=0.125 2024-08-11 14:18:04,939 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:18:20,929 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:18:35,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1138370.0, ans=0.05 2024-08-11 14:18:36,993 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-11 14:18:41,591 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12400, loss[loss=0.1068, beats_loss=0.0125, ecapa_loss=0.0001584, whisper_loss=0.09272, over 19773.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01122, ecapa_loss=0.0002013, whisper_loss=0.0937, over 3848991.12 frames. ], batch size: 77, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:18:48,222 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-08-11 14:19:06,093 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 14:19:11,135 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 14:19:14,080 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 14:19:51,982 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12450, loss[loss=0.108, beats_loss=0.01261, ecapa_loss=0.0001984, whisper_loss=0.09338, over 21590.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01119, ecapa_loss=0.0002017, whisper_loss=0.09384, over 3873306.36 frames. ], batch size: 88, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:19:59,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.783e+01 3.134e+01 3.561e+01 9.376e+01, threshold=6.268e+01, percent-clipped=1.0 2024-08-11 14:20:01,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1138970.0, ans=0.0 2024-08-11 14:20:14,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1139070.0, ans=0.02 2024-08-11 14:20:15,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1139070.0, ans=0.125 2024-08-11 14:20:21,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1139170.0, ans=0.2 2024-08-11 14:20:25,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1139170.0, ans=0.0 2024-08-11 14:20:25,746 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2024-08-11 14:20:45,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.07 vs. limit=12.0 2024-08-11 14:20:58,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1139370.0, ans=0.0 2024-08-11 14:21:02,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1139370.0, ans=0.125 2024-08-11 14:21:04,886 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12500, loss[loss=0.1156, beats_loss=0.01025, ecapa_loss=0.0001919, whisper_loss=0.1034, over 17981.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01116, ecapa_loss=0.000201, whisper_loss=0.0941, over 3859342.37 frames. ], batch size: 71, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:21:19,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1139570.0, ans=0.1 2024-08-11 14:21:21,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1139570.0, ans=10.0 2024-08-11 14:21:36,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1139670.0, ans=0.025 2024-08-11 14:21:37,459 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.64 vs. limit=22.5 2024-08-11 14:21:43,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1139670.0, ans=0.1 2024-08-11 14:21:53,190 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 14:21:54,546 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-11 14:21:54,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1139770.0, ans=0.0 2024-08-11 14:22:10,495 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-11 14:22:20,518 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12550, loss[loss=0.1061, beats_loss=0.01284, ecapa_loss=0.0002024, whisper_loss=0.09126, over 19945.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0113, ecapa_loss=0.0002002, whisper_loss=0.09321, over 3867607.72 frames. ], batch size: 80, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:22:27,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.780e+01 3.157e+01 3.733e+01 7.024e+01, threshold=6.315e+01, percent-clipped=2.0 2024-08-11 14:22:33,555 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 35 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 14:22:37,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1140070.0, ans=0.125 2024-08-11 14:22:47,466 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.23 vs. limit=8.0 2024-08-11 14:22:53,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1140170.0, ans=0.0 2024-08-11 14:22:53,821 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2024-08-11 14:22:56,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1140170.0, ans=0.1 2024-08-11 14:22:57,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1140170.0, ans=0.125 2024-08-11 14:23:09,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1140270.0, ans=0.125 2024-08-11 14:23:28,676 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 14:23:34,297 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12600, loss[loss=0.114, beats_loss=0.01091, ecapa_loss=0.0002065, whisper_loss=0.101, over 20248.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01138, ecapa_loss=0.0002022, whisper_loss=0.09247, over 3871148.91 frames. ], batch size: 80, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:23:37,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1140470.0, ans=0.125 2024-08-11 14:23:47,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1140570.0, ans=0.0 2024-08-11 14:23:51,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1140570.0, ans=0.0 2024-08-11 14:23:53,295 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 14:23:55,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.00 vs. limit=15.0 2024-08-11 14:24:00,754 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 14:24:07,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1140670.0, ans=0.025 2024-08-11 14:24:16,533 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2024-08-11 14:24:35,789 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 14:24:43,775 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2024-08-11 14:24:48,515 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12650, loss[loss=0.1132, beats_loss=0.01009, ecapa_loss=0.000252, whisper_loss=0.1006, over 21136.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01134, ecapa_loss=0.000201, whisper_loss=0.09278, over 3870705.79 frames. ], batch size: 91, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:24:55,235 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.818e+01 3.225e+01 3.809e+01 6.974e+01, threshold=6.451e+01, percent-clipped=1.0 2024-08-11 14:25:26,524 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 14:25:34,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1141270.0, ans=0.125 2024-08-11 14:25:35,879 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2024-08-11 14:26:00,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.44 vs. limit=10.0 2024-08-11 14:26:00,988 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12700, loss[loss=0.1144, beats_loss=0.01043, ecapa_loss=0.0001985, whisper_loss=0.102, over 22559.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0113, ecapa_loss=0.0001997, whisper_loss=0.09319, over 3844811.89 frames. ], batch size: 92, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:26:05,761 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-11 14:26:30,478 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.72 vs. limit=6.0 2024-08-11 14:26:31,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1141670.0, ans=0.125 2024-08-11 14:27:00,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=1141870.0, ans=0.2 2024-08-11 14:27:01,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1141870.0, ans=0.1 2024-08-11 14:27:10,623 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12750, loss[loss=0.127, beats_loss=0.01118, ecapa_loss=0.0001717, whisper_loss=0.1141, over 22489.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01129, ecapa_loss=0.0001995, whisper_loss=0.0935, over 3825775.65 frames. ], batch size: 87, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:27:12,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1141970.0, ans=0.0 2024-08-11 14:27:12,916 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.04 vs. limit=22.5 2024-08-11 14:27:14,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1141970.0, ans=0.1 2024-08-11 14:27:17,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+01 2.661e+01 2.986e+01 3.443e+01 7.051e+01, threshold=5.972e+01, percent-clipped=1.0 2024-08-11 14:27:27,177 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=15.0 2024-08-11 14:27:46,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1142170.0, ans=0.1 2024-08-11 14:27:53,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1142270.0, ans=0.2 2024-08-11 14:28:05,671 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 14:28:07,423 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.02 vs. limit=22.5 2024-08-11 14:28:11,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1142370.0, ans=0.125 2024-08-11 14:28:20,505 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12800, loss[loss=0.1101, beats_loss=0.01141, ecapa_loss=0.0001478, whisper_loss=0.09717, over 20272.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01127, ecapa_loss=0.0002003, whisper_loss=0.09374, over 3838491.44 frames. ], batch size: 77, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:28:29,084 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 14:28:32,581 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.89 vs. limit=6.0 2024-08-11 14:28:36,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1142570.0, ans=0.0 2024-08-11 14:28:44,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1142570.0, ans=0.1 2024-08-11 14:28:49,386 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=15.0 2024-08-11 14:29:31,897 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12850, loss[loss=0.1202, beats_loss=0.01377, ecapa_loss=0.0001678, whisper_loss=0.1048, over 22528.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0112, ecapa_loss=0.0002023, whisper_loss=0.09365, over 3843866.42 frames. ], batch size: 89, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:29:38,556 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.119e+01 2.679e+01 2.923e+01 3.402e+01 6.033e+01, threshold=5.846e+01, percent-clipped=2.0 2024-08-11 14:29:42,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-08-11 14:29:58,837 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 29 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 14:30:01,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1143170.0, ans=0.125 2024-08-11 14:30:10,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1143170.0, ans=0.0 2024-08-11 14:30:12,286 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-11 14:30:20,067 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=15.0 2024-08-11 14:30:23,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1143270.0, ans=0.2 2024-08-11 14:30:27,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1143370.0, ans=0.1 2024-08-11 14:30:29,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1143370.0, ans=0.2 2024-08-11 14:30:40,836 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12900, loss[loss=0.09923, beats_loss=0.01005, ecapa_loss=0.0001903, whisper_loss=0.08727, over 14239.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01125, ecapa_loss=0.0002009, whisper_loss=0.09282, over 3843937.62 frames. ], batch size: 55, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:30:46,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1143470.0, ans=0.1 2024-08-11 14:30:50,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1143470.0, ans=0.0 2024-08-11 14:30:54,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1143570.0, ans=0.125 2024-08-11 14:30:56,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1143570.0, ans=0.0 2024-08-11 14:31:00,402 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-11 14:31:07,976 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 14:31:21,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1143770.0, ans=0.04949747468305833 2024-08-11 14:31:25,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1143770.0, ans=0.95 2024-08-11 14:31:29,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1143770.0, ans=0.125 2024-08-11 14:31:31,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1143770.0, ans=0.0 2024-08-11 14:31:40,456 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 14:31:45,645 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-11 14:31:47,712 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=22.5 2024-08-11 14:31:48,361 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 12950, loss[loss=0.1089, beats_loss=0.0108, ecapa_loss=0.0001906, whisper_loss=0.0962, over 21674.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01124, ecapa_loss=0.000201, whisper_loss=0.0926, over 3834628.25 frames. ], batch size: 88, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:31:54,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.619e+01 2.896e+01 3.261e+01 4.562e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-11 14:32:03,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1144070.0, ans=0.07 2024-08-11 14:32:07,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1144070.0, ans=0.0 2024-08-11 14:32:13,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1144170.0, ans=0.1 2024-08-11 14:32:22,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1144170.0, ans=0.125 2024-08-11 14:32:22,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1144170.0, ans=0.1 2024-08-11 14:32:24,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1144170.0, ans=0.125 2024-08-11 14:32:26,555 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 14:32:29,822 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-08-11 14:32:36,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1144270.0, ans=10.0 2024-08-11 14:32:47,395 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 14:32:55,031 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13000, loss[loss=0.1038, beats_loss=0.01113, ecapa_loss=0.0001965, whisper_loss=0.09068, over 20895.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01138, ecapa_loss=0.0001982, whisper_loss=0.0917, over 3870729.39 frames. ], batch size: 86, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:33:02,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1144470.0, ans=0.2 2024-08-11 14:33:19,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1144570.0, ans=0.125 2024-08-11 14:33:31,688 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-08-11 14:33:33,216 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.05 vs. limit=22.5 2024-08-11 14:33:34,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1144770.0, ans=0.1 2024-08-11 14:33:38,032 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-11 14:34:00,584 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 14:34:01,624 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13050, loss[loss=0.1127, beats_loss=0.01168, ecapa_loss=0.0001812, whisper_loss=0.09915, over 22648.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0113, ecapa_loss=0.0001976, whisper_loss=0.09257, over 3845852.30 frames. ], batch size: 90, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:34:09,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.663e+01 3.009e+01 3.543e+01 5.736e+01, threshold=6.018e+01, percent-clipped=0.0 2024-08-11 14:34:10,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1144970.0, ans=0.1 2024-08-11 14:34:14,863 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 14:34:15,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1145070.0, ans=0.09899494936611666 2024-08-11 14:34:24,420 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 14:34:36,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1145170.0, ans=0.0 2024-08-11 14:34:50,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1145270.0, ans=0.125 2024-08-11 14:35:00,845 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 14:35:01,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1145370.0, ans=0.07 2024-08-11 14:35:08,560 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13100, loss[loss=0.1057, beats_loss=0.01175, ecapa_loss=0.0001976, whisper_loss=0.09201, over 21651.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01124, ecapa_loss=0.0001973, whisper_loss=0.09288, over 3837376.65 frames. ], batch size: 91, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:35:19,443 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 14:35:57,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1145770.0, ans=0.09899494936611666 2024-08-11 14:36:07,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1145870.0, ans=0.125 2024-08-11 14:36:16,084 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13150, loss[loss=0.1054, beats_loss=0.01245, ecapa_loss=0.0002153, whisper_loss=0.09084, over 23146.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01126, ecapa_loss=0.0001973, whisper_loss=0.09306, over 3819243.38 frames. ], batch size: 95, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:36:22,107 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 14:36:23,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1145970.0, ans=0.1 2024-08-11 14:36:24,469 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.646e+01 3.074e+01 3.551e+01 7.415e+01, threshold=6.148e+01, percent-clipped=1.0 2024-08-11 14:36:24,622 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 14:36:26,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1145970.0, ans=0.2 2024-08-11 14:36:27,491 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 14:36:32,904 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 14:36:39,973 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-11 14:36:50,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1146170.0, ans=0.125 2024-08-11 14:37:04,262 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 14:37:10,326 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 14:37:23,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1146370.0, ans=0.0 2024-08-11 14:37:25,071 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13200, loss[loss=0.111, beats_loss=0.01267, ecapa_loss=0.0001766, whisper_loss=0.09657, over 22028.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01134, ecapa_loss=0.0001979, whisper_loss=0.09242, over 3847097.99 frames. ], batch size: 88, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:37:30,849 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 14:37:33,620 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 34 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 14:37:34,385 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.75 vs. limit=22.5 2024-08-11 14:37:44,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1146570.0, ans=0.0 2024-08-11 14:37:52,227 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 14:38:01,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1146670.0, ans=0.2 2024-08-11 14:38:01,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1146670.0, ans=0.125 2024-08-11 14:38:07,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1146770.0, ans=0.125 2024-08-11 14:38:10,931 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 14:38:17,878 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.0 2024-08-11 14:38:20,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1146870.0, ans=0.125 2024-08-11 14:38:29,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1146870.0, ans=0.1 2024-08-11 14:38:31,578 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13250, loss[loss=0.09562, beats_loss=0.01054, ecapa_loss=0.0002307, whisper_loss=0.08278, over 22469.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01127, ecapa_loss=0.0001987, whisper_loss=0.09315, over 3842338.04 frames. ], batch size: 93, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:38:39,879 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.719e+01 3.002e+01 3.497e+01 5.724e+01, threshold=6.004e+01, percent-clipped=0.0 2024-08-11 14:39:19,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1147270.0, ans=0.125 2024-08-11 14:39:31,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1147370.0, ans=0.1 2024-08-11 14:39:38,736 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13300, loss[loss=0.09852, beats_loss=0.01343, ecapa_loss=0.0001782, whisper_loss=0.08331, over 20004.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01131, ecapa_loss=0.0001978, whisper_loss=0.0939, over 3869322.03 frames. ], batch size: 81, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:39:44,327 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 14:39:54,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1147570.0, ans=0.125 2024-08-11 14:40:07,475 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2024-08-11 14:40:17,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2024-08-11 14:40:28,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1147770.0, ans=0.125 2024-08-11 14:40:44,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13350, loss[loss=0.1229, beats_loss=0.01197, ecapa_loss=0.0001841, whisper_loss=0.1091, over 16988.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0113, ecapa_loss=0.0001965, whisper_loss=0.09445, over 3863821.55 frames. ], batch size: 68, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:40:53,109 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.881e+01 3.191e+01 3.673e+01 5.435e+01, threshold=6.381e+01, percent-clipped=0.0 2024-08-11 14:40:55,936 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 14:40:56,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1147970.0, ans=0.0 2024-08-11 14:41:19,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1148170.0, ans=0.1 2024-08-11 14:41:32,803 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 14:41:37,589 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2024-08-11 14:41:44,904 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 14:41:52,696 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13400, loss[loss=0.09096, beats_loss=0.01267, ecapa_loss=0.0002061, whisper_loss=0.07623, over 15629.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.0113, ecapa_loss=0.0001956, whisper_loss=0.09423, over 3873131.32 frames. ], batch size: 63, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:42:03,474 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-11 14:42:31,659 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-11 14:42:46,356 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.86 vs. limit=22.5 2024-08-11 14:42:59,186 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13450, loss[loss=0.1014, beats_loss=0.008639, ecapa_loss=0.0002366, whisper_loss=0.09041, over 16221.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01125, ecapa_loss=0.0001971, whisper_loss=0.09384, over 3880248.51 frames. ], batch size: 65, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:42:59,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1148970.0, ans=0.2 2024-08-11 14:43:06,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.672e+01 2.998e+01 3.496e+01 5.811e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 14:43:10,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1148970.0, ans=0.0 2024-08-11 14:43:11,253 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 14:43:14,829 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.64 vs. limit=6.0 2024-08-11 14:43:15,757 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 37 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-11 14:43:29,960 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 14:44:00,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1149370.0, ans=0.0 2024-08-11 14:44:06,858 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13500, loss[loss=0.1154, beats_loss=0.0122, ecapa_loss=0.0001609, whisper_loss=0.1016, over 23055.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01126, ecapa_loss=0.0001978, whisper_loss=0.09397, over 3873816.02 frames. ], batch size: 90, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:44:18,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1149470.0, ans=0.125 2024-08-11 14:44:19,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1149570.0, ans=0.0 2024-08-11 14:44:22,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1149570.0, ans=0.2 2024-08-11 14:44:23,680 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 14:44:48,157 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 14:44:51,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1149770.0, ans=0.0 2024-08-11 14:44:54,957 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 14:45:04,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1149870.0, ans=0.125 2024-08-11 14:45:12,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1149970.0, ans=0.125 2024-08-11 14:45:13,750 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13550, loss[loss=0.09543, beats_loss=0.01209, ecapa_loss=0.0001588, whisper_loss=0.08176, over 19221.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01131, ecapa_loss=0.0001977, whisper_loss=0.09377, over 3887289.00 frames. ], batch size: 77, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:45:20,870 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 23 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-11 14:45:22,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.724e+01 3.026e+01 3.356e+01 6.368e+01, threshold=6.052e+01, percent-clipped=1.0 2024-08-11 14:45:26,245 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 20 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-11 14:45:54,434 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 15 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 14:46:02,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1150270.0, ans=0.0 2024-08-11 14:46:03,446 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-11 14:46:04,815 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-11 14:46:07,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=1150370.0, ans=22.5 2024-08-11 14:46:15,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1150370.0, ans=0.0 2024-08-11 14:46:20,786 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13600, loss[loss=0.1009, beats_loss=0.0118, ecapa_loss=0.0001456, whisper_loss=0.08765, over 14461.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01132, ecapa_loss=0.0001969, whisper_loss=0.09432, over 3906499.37 frames. ], batch size: 55, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:46:23,536 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 14:46:32,856 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-11 14:46:35,242 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.71 vs. limit=15.0 2024-08-11 14:46:37,829 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.21 vs. limit=15.0 2024-08-11 14:46:45,108 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 14:46:51,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1150670.0, ans=0.1 2024-08-11 14:47:12,515 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 14:47:27,225 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13650, loss[loss=0.1089, beats_loss=0.01098, ecapa_loss=0.0001854, whisper_loss=0.09602, over 19647.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01142, ecapa_loss=0.0001969, whisper_loss=0.09416, over 3940970.00 frames. ], batch size: 79, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:47:32,467 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:47:34,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.222e+01 2.952e+01 3.395e+01 3.813e+01 5.359e+01, threshold=6.790e+01, percent-clipped=0.0 2024-08-11 14:47:48,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1151070.0, ans=0.0 2024-08-11 14:48:14,314 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 14:48:34,078 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13700, loss[loss=0.1059, beats_loss=0.01161, ecapa_loss=0.0002092, whisper_loss=0.0922, over 15382.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0114, ecapa_loss=0.0001987, whisper_loss=0.09402, over 3911110.66 frames. ], batch size: 64, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:48:43,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1151470.0, ans=0.1 2024-08-11 14:48:46,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1151470.0, ans=0.2 2024-08-11 14:49:28,384 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-11 14:49:40,086 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 11 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 14:49:41,193 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13750, loss[loss=0.06866, beats_loss=0.01365, ecapa_loss=0.0001701, whisper_loss=0.05331, over 14523.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01142, ecapa_loss=0.0001977, whisper_loss=0.09353, over 3900342.31 frames. ], batch size: 58, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:49:41,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1151970.0, ans=0.125 2024-08-11 14:49:49,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.564e+01 2.884e+01 3.394e+01 1.263e+02, threshold=5.769e+01, percent-clipped=1.0 2024-08-11 14:49:52,468 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 14:50:00,601 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 14:50:30,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1152270.0, ans=0.125 2024-08-11 14:50:34,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1152370.0, ans=0.125 2024-08-11 14:50:43,962 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:50:48,433 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13800, loss[loss=0.1081, beats_loss=0.01187, ecapa_loss=0.0002155, whisper_loss=0.09403, over 22264.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01139, ecapa_loss=0.0001968, whisper_loss=0.09344, over 3894832.90 frames. ], batch size: 91, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:50:50,179 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 32 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 14:51:06,301 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 14:51:07,490 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 14:51:10,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1152570.0, ans=0.125 2024-08-11 14:51:22,142 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 14:51:23,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1152670.0, ans=0.0 2024-08-11 14:51:26,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1152670.0, ans=0.0 2024-08-11 14:51:37,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1152770.0, ans=15.0 2024-08-11 14:51:55,135 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13850, loss[loss=0.1193, beats_loss=0.01077, ecapa_loss=0.0001605, whisper_loss=0.1069, over 17622.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01136, ecapa_loss=0.0001955, whisper_loss=0.09344, over 3906745.11 frames. ], batch size: 67, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:52:03,065 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.659e+01 3.124e+01 3.574e+01 6.862e+01, threshold=6.248e+01, percent-clipped=1.0 2024-08-11 14:53:00,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1153470.0, ans=0.2 2024-08-11 14:53:01,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13900, loss[loss=0.1029, beats_loss=0.01129, ecapa_loss=0.0002045, whisper_loss=0.0896, over 19256.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01138, ecapa_loss=0.0001949, whisper_loss=0.09418, over 3933114.64 frames. ], batch size: 75, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:53:12,399 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 14 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 14:53:22,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1153570.0, ans=0.0 2024-08-11 14:53:30,482 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.56 vs. limit=10.0 2024-08-11 14:53:31,635 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2024-08-11 14:53:33,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1153670.0, ans=0.1 2024-08-11 14:53:42,899 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 23 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-11 14:54:01,260 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 14:54:07,692 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 13950, loss[loss=0.099, beats_loss=0.01139, ecapa_loss=0.000209, whisper_loss=0.08552, over 19400.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01137, ecapa_loss=0.0001952, whisper_loss=0.09404, over 3909748.81 frames. ], batch size: 81, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:54:15,660 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.781e+01 3.096e+01 3.577e+01 5.485e+01, threshold=6.193e+01, percent-clipped=0.0 2024-08-11 14:54:18,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1153970.0, ans=0.0 2024-08-11 14:54:22,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1154070.0, ans=10.0 2024-08-11 14:54:24,582 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.58 vs. limit=12.0 2024-08-11 14:54:30,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1154070.0, ans=0.0 2024-08-11 14:54:37,042 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 14:54:39,758 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 14:54:40,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1154170.0, ans=0.125 2024-08-11 14:54:48,543 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 30 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 14:55:11,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1154370.0, ans=0.125 2024-08-11 14:55:16,112 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 14000, loss[loss=0.09963, beats_loss=0.01313, ecapa_loss=0.0001712, whisper_loss=0.08479, over 19251.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01138, ecapa_loss=0.0001944, whisper_loss=0.09387, over 3910508.61 frames. ], batch size: 78, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:55:27,825 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 30 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 14:56:04,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1154770.0, ans=0.125 2024-08-11 14:56:09,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1154770.0, ans=0.125 2024-08-11 14:56:22,091 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.857e+05 2024-08-11 14:56:27,397 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 14050, loss[loss=0.1063, beats_loss=0.01191, ecapa_loss=0.0001826, whisper_loss=0.09256, over 22979.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01141, ecapa_loss=0.0001931, whisper_loss=0.09351, over 3910488.19 frames. ], batch size: 92, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:56:36,825 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.747e+01 3.034e+01 3.556e+01 6.486e+01, threshold=6.067e+01, percent-clipped=1.0 2024-08-11 14:56:39,519 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.45 vs. limit=22.5 2024-08-11 14:56:55,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1155070.0, ans=0.125 2024-08-11 14:56:57,117 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2024-08-11 14:57:09,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1155170.0, ans=0.125 2024-08-11 14:57:10,724 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.252e+01 2024-08-11 14:57:17,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1155270.0, ans=0.5 2024-08-11 14:57:33,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1155370.0, ans=0.0 2024-08-11 14:57:39,542 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 14:57:42,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1155470.0, ans=0.5 2024-08-11 14:57:43,613 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 14100, loss[loss=0.1208, beats_loss=0.01102, ecapa_loss=0.0001564, whisper_loss=0.1082, over 17991.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01147, ecapa_loss=0.000191, whisper_loss=0.09299, over 3888397.53 frames. ], batch size: 69, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:57:58,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1155570.0, ans=0.0 2024-08-11 14:58:00,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1155570.0, ans=0.0 2024-08-11 14:58:45,751 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 14:58:48,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1155870.0, ans=0.0 2024-08-11 14:58:59,694 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 14150, loss[loss=0.1232, beats_loss=0.01099, ecapa_loss=0.0002109, whisper_loss=0.1101, over 22111.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01137, ecapa_loss=0.0001945, whisper_loss=0.09376, over 3898986.91 frames. ], batch size: 90, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:59:04,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=1155970.0, ans=6.0 2024-08-11 14:59:05,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1155970.0, ans=0.125 2024-08-11 14:59:08,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.682e+01 3.045e+01 3.525e+01 6.405e+01, threshold=6.090e+01, percent-clipped=1.0 2024-08-11 14:59:13,941 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 14:59:19,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1156070.0, ans=0.125 2024-08-11 14:59:42,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=1156170.0, ans=0.2 2024-08-11 14:59:44,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-08-11 14:59:53,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1156270.0, ans=0.125 2024-08-11 14:59:57,169 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-11 14:59:58,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1156270.0, ans=0.95 2024-08-11 15:00:11,878 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 19 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-11 15:00:17,348 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 14200, loss[loss=0.1163, beats_loss=0.00864, ecapa_loss=0.0002692, whisper_loss=0.105, over 16640.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01144, ecapa_loss=0.0001946, whisper_loss=0.09273, over 3865204.47 frames. ], batch size: 69, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:00:26,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1156470.0, ans=0.125 2024-08-11 15:00:56,666 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-11 15:01:01,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1156770.0, ans=0.025 2024-08-11 15:01:02,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1156770.0, ans=0.125 2024-08-11 15:01:13,341 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2024-08-11 15:01:32,799 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 14250, loss[loss=0.09756, beats_loss=0.01389, ecapa_loss=0.0002139, whisper_loss=0.08153, over 20733.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0114, ecapa_loss=0.0001949, whisper_loss=0.09304, over 3853947.01 frames. ], batch size: 89, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:01:43,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.820e+01 3.214e+01 3.813e+01 8.671e+01, threshold=6.428e+01, percent-clipped=3.0 2024-08-11 15:01:51,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1157070.0, ans=0.1 2024-08-11 15:02:09,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1157170.0, ans=0.0 2024-08-11 15:02:13,551 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 15:02:20,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1157270.0, ans=0.125 2024-08-11 15:02:28,050 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 15:02:39,409 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-11 15:02:48,025 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 15:02:49,797 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 15:02:52,813 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 14300, loss[loss=0.1494, beats_loss=0.007776, ecapa_loss=0.0002018, whisper_loss=0.1396, over 21296.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01144, ecapa_loss=0.0001938, whisper_loss=0.09279, over 3857194.87 frames. ], batch size: 81, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:02:53,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1157470.0, ans=0.04949747468305833 2024-08-11 15:03:21,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1157670.0, ans=0.0 2024-08-11 15:03:21,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1157670.0, ans=0.09899494936611666 2024-08-11 15:03:23,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1157670.0, ans=0.125 2024-08-11 15:03:28,776 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 15:03:36,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1157770.0, ans=0.125 2024-08-11 15:04:02,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1157870.0, ans=0.0 2024-08-11 15:04:07,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1157970.0, ans=0.04949747468305833 2024-08-11 15:04:07,913 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 14350, loss[loss=0.1243, beats_loss=0.009342, ecapa_loss=0.0001737, whisper_loss=0.1132, over 16446.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01138, ecapa_loss=0.0001941, whisper_loss=0.09302, over 3855528.07 frames. ], batch size: 64, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:04:09,530 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 15:04:16,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 2.896e+01 3.266e+01 3.801e+01 1.000e+02, threshold=6.532e+01, percent-clipped=2.0 2024-08-11 15:04:23,818 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 34 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 15:04:24,338 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-11 15:04:27,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1158070.0, ans=0.125 2024-08-11 15:04:32,765 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 19 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-11 15:04:43,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1158170.0, ans=0.1 2024-08-11 15:04:45,899 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.73 vs. limit=22.5 2024-08-11 15:04:48,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1158170.0, ans=0.025 2024-08-11 15:04:50,638 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 15:04:52,134 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-11 15:04:55,781 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 15:04:57,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1158270.0, ans=0.0 2024-08-11 15:05:01,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1158270.0, ans=0.0 2024-08-11 15:05:02,057 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 15:05:05,309 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2024-08-11 15:05:13,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1158370.0, ans=0.125 2024-08-11 15:05:16,633 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 29 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-11 15:05:23,675 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 14400, loss[loss=0.1068, beats_loss=0.01214, ecapa_loss=0.000211, whisper_loss=0.09253, over 22254.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01136, ecapa_loss=0.0001947, whisper_loss=0.09271, over 3873051.32 frames. ], batch size: 90, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:05:34,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1158470.0, ans=0.0 2024-08-11 15:05:41,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1158570.0, ans=0.125 2024-08-11 15:05:55,827 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 15:05:58,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1158670.0, ans=0.07 2024-08-11 15:06:15,910 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-11 15:06:30,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1158870.0, ans=0.125 2024-08-11 15:06:31,141 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=10.0 2024-08-11 15:06:32,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=1158870.0, ans=0.1 2024-08-11 15:06:39,097 INFO [train_multi_KD3.py:1116] (1/4) Epoch 8, batch 14450, loss[loss=0.1233, beats_loss=0.009786, ecapa_loss=0.000171, whisper_loss=0.1118, over 15356.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01132, ecapa_loss=0.0001957, whisper_loss=0.09265, over 3870316.88 frames. ], batch size: 58, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:06:39,294 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 15:06:39,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1158970.0, ans=0.125 2024-08-11 15:06:48,862 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.150e+01 2.733e+01 3.088e+01 3.504e+01 7.570e+01, threshold=6.176e+01, percent-clipped=1.0 2024-08-11 15:07:02,843 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 21 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 15:07:05,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1159070.0, ans=0.2 2024-08-11 15:07:11,303 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 10 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 15:07:21,509 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 14 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-11 15:07:27,997 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.773e-01 2024-08-11 15:07:29,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1159270.0, ans=0.1 2024-08-11 15:07:30,577 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 15:08:19,711 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 0, loss[loss=0.09984, beats_loss=0.01278, ecapa_loss=0.0001839, whisper_loss=0.08523, over 21590.00 frames. ], tot_loss[loss=0.09984, beats_loss=0.01278, ecapa_loss=0.0001839, whisper_loss=0.08523, over 21590.00 frames. ], batch size: 83, lr: 7.24e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:08:19,712 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 15:08:56,618 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on ASR_libri: loss=0.2578, beats_loss=0, ecapa_loss=0.0006493, whisper_loss=0.2513, over 922467.00 frames. 2024-08-11 15:09:15,539 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on SV_voxceleb1: loss=0.005328, beats_loss=0, ecapa_loss=0.0005328, whisper_loss=0, over 939242.00 frames. 2024-08-11 15:11:18,919 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on AT_audioset: loss=0.0249, beats_loss=0.0249, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 15:11:18,921 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 15:12:19,277 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-11 15:13:41,687 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-11 15:14:02,695 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.688e-01 2024-08-11 15:14:08,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1159780.0, ans=0.125 2024-08-11 15:14:32,814 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 50, loss[loss=0.1124, beats_loss=0.01178, ecapa_loss=0.0002335, whisper_loss=0.09826, over 22486.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01102, ecapa_loss=0.0002009, whisper_loss=0.0919, over 870154.64 frames. ], batch size: 92, lr: 7.24e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:14:43,302 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.946e-02 2024-08-11 15:15:21,774 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 15:15:37,903 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 15:15:52,762 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.906e+01 3.207e+01 3.715e+01 5.089e+01, threshold=6.415e+01, percent-clipped=0.0 2024-08-11 15:16:01,815 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2024-08-11 15:16:24,328 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=15.0 2024-08-11 15:16:36,509 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.16 vs. limit=15.0 2024-08-11 15:18:01,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1160180.0, ans=0.0 2024-08-11 15:19:03,943 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 100, loss[loss=0.1126, beats_loss=0.009189, ecapa_loss=0.0002324, whisper_loss=0.1011, over 19384.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01087, ecapa_loss=0.0001972, whisper_loss=0.09049, over 1534763.95 frames. ], batch size: 78, lr: 7.24e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:19:46,862 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-11 15:20:30,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1160480.0, ans=0.125 2024-08-11 15:20:47,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1160580.0, ans=0.0 2024-08-11 15:21:02,408 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 15:21:09,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1160680.0, ans=0.125 2024-08-11 15:21:35,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1160780.0, ans=0.125 2024-08-11 15:21:38,892 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 15:21:42,083 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2024-08-11 15:22:03,413 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 150, loss[loss=0.1022, beats_loss=0.01164, ecapa_loss=0.0002223, whisper_loss=0.08833, over 21992.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01057, ecapa_loss=0.0001988, whisper_loss=0.0918, over 2027130.42 frames. ], batch size: 90, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:22:40,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1160980.0, ans=0.125 2024-08-11 15:22:43,446 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.986e+01 3.190e+01 3.682e+01 6.515e+01, threshold=6.380e+01, percent-clipped=1.0 2024-08-11 15:23:07,170 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 15:23:22,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1161180.0, ans=0.04949747468305833 2024-08-11 15:23:30,171 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=22.5 2024-08-11 15:23:44,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1161280.0, ans=0.2 2024-08-11 15:24:02,900 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 200, loss[loss=0.1212, beats_loss=0.009958, ecapa_loss=0.0002045, whisper_loss=0.1091, over 22920.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01059, ecapa_loss=0.0001983, whisper_loss=0.09349, over 2430330.68 frames. ], batch size: 89, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:24:15,618 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2024-08-11 15:24:16,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1161380.0, ans=0.2 2024-08-11 15:24:36,616 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2024-08-11 15:24:54,497 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-08-11 15:25:05,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1161680.0, ans=0.0 2024-08-11 15:25:11,223 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-11 15:25:35,823 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 250, loss[loss=0.09277, beats_loss=0.01207, ecapa_loss=0.0001871, whisper_loss=0.07883, over 16516.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01082, ecapa_loss=0.0001947, whisper_loss=0.09272, over 2732438.02 frames. ], batch size: 65, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:25:41,054 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 15:25:50,952 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2024-08-11 15:26:02,192 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 19 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 15:26:05,214 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.655e+01 2.964e+01 3.308e+01 4.229e+01, threshold=5.928e+01, percent-clipped=0.0 2024-08-11 15:26:31,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1162080.0, ans=0.125 2024-08-11 15:26:37,697 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 15:26:55,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1162180.0, ans=0.125 2024-08-11 15:26:55,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1162180.0, ans=15.0 2024-08-11 15:27:02,851 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 15:27:06,437 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-11 15:27:08,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1162280.0, ans=0.125 2024-08-11 15:27:08,646 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.22 vs. limit=22.5 2024-08-11 15:27:21,662 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 300, loss[loss=0.09482, beats_loss=0.009777, ecapa_loss=0.0001999, whisper_loss=0.08305, over 18144.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001939, whisper_loss=0.09159, over 2967842.58 frames. ], batch size: 70, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:27:30,385 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2024-08-11 15:27:39,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1162480.0, ans=0.125 2024-08-11 15:27:41,640 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-11 15:27:47,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1162480.0, ans=0.125 2024-08-11 15:27:59,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1162580.0, ans=0.0 2024-08-11 15:28:09,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1162680.0, ans=0.0 2024-08-11 15:28:18,243 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 15:28:25,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1162780.0, ans=0.0 2024-08-11 15:28:37,296 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.52 vs. limit=10.0 2024-08-11 15:28:39,420 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 350, loss[loss=0.08415, beats_loss=0.01372, ecapa_loss=0.0002025, whisper_loss=0.06841, over 19601.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01103, ecapa_loss=0.0001929, whisper_loss=0.0918, over 3168092.62 frames. ], batch size: 80, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:28:39,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1162880.0, ans=0.125 2024-08-11 15:28:48,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1162880.0, ans=0.09899494936611666 2024-08-11 15:29:00,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.571e+01 3.026e+01 3.460e+01 5.079e+01, threshold=6.051e+01, percent-clipped=0.0 2024-08-11 15:29:18,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1163080.0, ans=0.5 2024-08-11 15:29:27,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1163180.0, ans=0.1 2024-08-11 15:29:38,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1163280.0, ans=0.125 2024-08-11 15:29:40,242 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 15:29:50,329 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-11 15:29:50,804 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 400, loss[loss=0.1067, beats_loss=0.01136, ecapa_loss=0.0002029, whisper_loss=0.09329, over 15423.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01109, ecapa_loss=0.0001917, whisper_loss=0.09074, over 3283804.49 frames. ], batch size: 63, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:30:04,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1163480.0, ans=0.125 2024-08-11 15:30:10,582 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 32 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 15:30:19,599 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2024-08-11 15:30:24,912 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 15:30:26,202 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 17 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 15:30:32,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2024-08-11 15:30:40,103 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 8 from Vox, 28 fro AS 2024-08-11 15:30:48,803 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 15:30:49,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1163780.0, ans=0.025 2024-08-11 15:30:52,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1163780.0, ans=0.0 2024-08-11 15:31:00,261 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 15:31:01,321 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 450, loss[loss=0.09582, beats_loss=0.0122, ecapa_loss=0.0002212, whisper_loss=0.08141, over 19125.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01109, ecapa_loss=0.0001898, whisper_loss=0.09082, over 3392149.09 frames. ], batch size: 80, lr: 7.22e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:31:03,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1163880.0, ans=0.125 2024-08-11 15:31:04,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1163880.0, ans=0.5 2024-08-11 15:31:07,274 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 15:31:22,788 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.636e+01 2.915e+01 3.353e+01 5.482e+01, threshold=5.829e+01, percent-clipped=0.0 2024-08-11 15:31:26,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1163980.0, ans=0.2 2024-08-11 15:31:37,927 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-11 15:32:09,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1164280.0, ans=0.125 2024-08-11 15:32:11,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1164280.0, ans=0.125 2024-08-11 15:32:14,350 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 500, loss[loss=0.1137, beats_loss=0.00748, ecapa_loss=0.0002559, whisper_loss=0.1036, over 16833.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01107, ecapa_loss=0.0001891, whisper_loss=0.09142, over 3493381.86 frames. ], batch size: 69, lr: 7.22e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:32:14,660 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-11 15:32:21,213 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.14 vs. limit=22.5 2024-08-11 15:32:33,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1164480.0, ans=0.1 2024-08-11 15:32:35,943 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 30 from Vox, 23 fro AS 2024-08-11 15:32:41,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1164480.0, ans=0.0 2024-08-11 15:32:46,626 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 15:33:06,240 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-11 15:33:11,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1164780.0, ans=0.2 2024-08-11 15:33:13,507 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 15:33:19,779 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=12.0 2024-08-11 15:33:26,519 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 550, loss[loss=0.105, beats_loss=0.01026, ecapa_loss=0.0002082, whisper_loss=0.09263, over 21322.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01106, ecapa_loss=0.0001893, whisper_loss=0.09235, over 3559332.72 frames. ], batch size: 88, lr: 7.22e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:33:27,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1164880.0, ans=0.125 2024-08-11 15:33:48,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.637e+01 3.008e+01 3.365e+01 4.595e+01, threshold=6.017e+01, percent-clipped=0.0 2024-08-11 15:33:54,229 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-11 15:33:56,236 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-11 15:34:04,143 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 15:34:06,469 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=15.0 2024-08-11 15:34:10,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1165180.0, ans=0.0 2024-08-11 15:34:38,382 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 600, loss[loss=0.111, beats_loss=0.01304, ecapa_loss=0.0001384, whisper_loss=0.09659, over 19353.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01107, ecapa_loss=0.0001888, whisper_loss=0.09301, over 3603192.40 frames. ], batch size: 75, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:34:49,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1165380.0, ans=0.0 2024-08-11 15:35:07,330 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 15:35:11,377 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 26 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 15:35:22,915 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.09 vs. limit=22.5 2024-08-11 15:35:24,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1165680.0, ans=0.125 2024-08-11 15:35:27,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1165680.0, ans=0.125 2024-08-11 15:35:31,971 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 15:35:32,417 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.76 vs. limit=22.5 2024-08-11 15:35:46,016 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-11 15:35:46,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1165780.0, ans=0.1 2024-08-11 15:35:53,114 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 650, loss[loss=0.1095, beats_loss=0.01023, ecapa_loss=0.0001706, whisper_loss=0.09757, over 16253.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01103, ecapa_loss=0.0001893, whisper_loss=0.09327, over 3657204.34 frames. ], batch size: 62, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:36:05,579 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 18 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 15:36:07,782 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 15:36:16,129 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.692e+01 3.015e+01 3.566e+01 6.762e+01, threshold=6.030e+01, percent-clipped=2.0 2024-08-11 15:36:29,444 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 15:37:13,540 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 700, loss[loss=0.1239, beats_loss=0.01107, ecapa_loss=0.0001625, whisper_loss=0.1113, over 14619.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01108, ecapa_loss=0.0001889, whisper_loss=0.09337, over 3677751.50 frames. ], batch size: 53, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:37:17,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1166380.0, ans=0.125 2024-08-11 15:37:22,583 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 15:37:30,795 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 15:38:07,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1166680.0, ans=0.1 2024-08-11 15:38:35,760 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 750, loss[loss=0.09866, beats_loss=0.01127, ecapa_loss=0.0002282, whisper_loss=0.08511, over 21049.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01104, ecapa_loss=0.0001887, whisper_loss=0.09333, over 3665268.27 frames. ], batch size: 88, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:38:37,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1166880.0, ans=0.07 2024-08-11 15:38:40,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1166880.0, ans=0.1 2024-08-11 15:38:42,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1166880.0, ans=0.0 2024-08-11 15:39:00,322 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.570e+01 2.889e+01 3.485e+01 5.934e+01, threshold=5.777e+01, percent-clipped=0.0 2024-08-11 15:39:41,746 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 15:40:00,601 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 800, loss[loss=0.1222, beats_loss=0.009029, ecapa_loss=0.0002037, whisper_loss=0.1111, over 21736.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01106, ecapa_loss=0.0001896, whisper_loss=0.09302, over 3731886.50 frames. ], batch size: 83, lr: 7.21e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:40:08,189 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-11 15:40:14,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1167380.0, ans=0.0 2024-08-11 15:40:17,950 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 15:40:22,861 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-11 15:40:27,169 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2024-08-11 15:40:54,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1167680.0, ans=0.2 2024-08-11 15:41:13,141 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 15:41:23,868 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 15:41:25,090 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 850, loss[loss=0.1117, beats_loss=0.01098, ecapa_loss=0.0001863, whisper_loss=0.09883, over 18463.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01101, ecapa_loss=0.0001888, whisper_loss=0.09283, over 3765164.68 frames. ], batch size: 72, lr: 7.21e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:41:34,883 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 15:41:47,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1167980.0, ans=0.2 2024-08-11 15:41:52,896 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.648e+01 3.009e+01 3.325e+01 6.049e+01, threshold=6.017e+01, percent-clipped=1.0 2024-08-11 15:42:13,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1168080.0, ans=0.1 2024-08-11 15:42:33,790 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 15:42:36,357 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 15:42:40,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1168280.0, ans=0.125 2024-08-11 15:42:50,108 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 900, loss[loss=0.1163, beats_loss=0.009466, ecapa_loss=0.0001783, whisper_loss=0.105, over 18412.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01101, ecapa_loss=0.0001875, whisper_loss=0.09253, over 3761676.89 frames. ], batch size: 68, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:43:25,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1168580.0, ans=0.125 2024-08-11 15:43:39,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=15.0 2024-08-11 15:43:53,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1168680.0, ans=0.125 2024-08-11 15:44:15,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 950, loss[loss=0.08054, beats_loss=0.01363, ecapa_loss=0.0001946, whisper_loss=0.06496, over 19134.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01116, ecapa_loss=0.000188, whisper_loss=0.09146, over 3752351.71 frames. ], batch size: 78, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:44:33,829 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 15:44:35,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1168980.0, ans=0.0 2024-08-11 15:44:42,602 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.663e+01 2.966e+01 3.403e+01 1.009e+02, threshold=5.932e+01, percent-clipped=1.0 2024-08-11 15:44:51,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1169080.0, ans=0.125 2024-08-11 15:44:51,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1169080.0, ans=0.125 2024-08-11 15:44:56,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1169080.0, ans=0.125 2024-08-11 15:44:57,930 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 15:45:01,250 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.60 vs. limit=12.0 2024-08-11 15:45:14,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1169180.0, ans=0.125 2024-08-11 15:45:16,355 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-11 15:45:29,789 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 15:45:37,293 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1000, loss[loss=0.1052, beats_loss=0.01078, ecapa_loss=0.0001932, whisper_loss=0.09251, over 15051.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01118, ecapa_loss=0.0001871, whisper_loss=0.09208, over 3795630.78 frames. ], batch size: 59, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:46:46,326 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 25 from LS+wenet, 22 from Vox, 16 fro AS 2024-08-11 15:46:48,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1169780.0, ans=0.125 2024-08-11 15:46:53,091 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 15:47:00,909 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1050, loss[loss=0.1093, beats_loss=0.01449, ecapa_loss=0.0001358, whisper_loss=0.09343, over 22377.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01106, ecapa_loss=0.0001874, whisper_loss=0.09215, over 3780889.96 frames. ], batch size: 87, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:47:10,984 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 15:47:22,052 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 33 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 15:47:29,254 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.579e+01 2.847e+01 3.241e+01 6.261e+01, threshold=5.695e+01, percent-clipped=1.0 2024-08-11 15:47:48,965 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 15:48:10,311 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 15:48:32,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=15.0 2024-08-11 15:48:32,759 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1100, loss[loss=0.0793, beats_loss=0.01442, ecapa_loss=0.0001457, whisper_loss=0.06342, over 16946.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.011, ecapa_loss=0.0001882, whisper_loss=0.09256, over 3790011.57 frames. ], batch size: 68, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:48:35,947 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.880e-01 2024-08-11 15:48:37,021 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 15:48:48,431 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 15:48:48,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1170480.0, ans=0.125 2024-08-11 15:48:50,536 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 15:49:03,398 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-11 15:49:04,105 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.75 vs. limit=15.0 2024-08-11 15:49:33,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1170680.0, ans=0.2 2024-08-11 15:49:58,945 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1150, loss[loss=0.1052, beats_loss=0.01405, ecapa_loss=0.0002116, whisper_loss=0.08906, over 20466.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0111, ecapa_loss=0.0001872, whisper_loss=0.09257, over 3821078.95 frames. ], batch size: 86, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:50:01,234 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.29 vs. limit=22.5 2024-08-11 15:50:25,760 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.574e+01 2.982e+01 3.415e+01 5.178e+01, threshold=5.965e+01, percent-clipped=0.0 2024-08-11 15:50:35,237 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 15:50:44,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1171080.0, ans=0.0 2024-08-11 15:50:44,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1171080.0, ans=0.0 2024-08-11 15:50:47,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1171180.0, ans=0.1 2024-08-11 15:51:16,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1171280.0, ans=0.125 2024-08-11 15:51:19,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1171380.0, ans=0.2 2024-08-11 15:51:19,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1171380.0, ans=0.125 2024-08-11 15:51:20,650 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1200, loss[loss=0.1086, beats_loss=0.008453, ecapa_loss=0.0002107, whisper_loss=0.09802, over 21737.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01101, ecapa_loss=0.0001892, whisper_loss=0.09303, over 3808823.59 frames. ], batch size: 87, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:51:41,692 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 21 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 15:51:49,597 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 19 from LS+wenet, 23 from Vox, 54 fro AS 2024-08-11 15:51:59,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1171580.0, ans=0.1 2024-08-11 15:52:01,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1171580.0, ans=0.0 2024-08-11 15:52:06,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1171580.0, ans=0.125 2024-08-11 15:52:17,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1171680.0, ans=0.125 2024-08-11 15:52:30,657 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 15:52:37,223 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 15:52:42,267 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1250, loss[loss=0.09839, beats_loss=0.01063, ecapa_loss=0.0001993, whisper_loss=0.08577, over 21352.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01103, ecapa_loss=0.0001879, whisper_loss=0.09288, over 3812139.31 frames. ], batch size: 88, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:52:42,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1171880.0, ans=6.0 2024-08-11 15:52:43,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1171880.0, ans=0.1 2024-08-11 15:53:00,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1171980.0, ans=0.125 2024-08-11 15:53:01,364 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 15:53:05,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1171980.0, ans=0.125 2024-08-11 15:53:06,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1171980.0, ans=0.0 2024-08-11 15:53:07,747 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.593e+01 3.089e+01 3.473e+01 5.447e+01, threshold=6.177e+01, percent-clipped=0.0 2024-08-11 15:53:11,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1171980.0, ans=0.125 2024-08-11 15:54:02,365 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1300, loss[loss=0.104, beats_loss=0.01154, ecapa_loss=0.0001789, whisper_loss=0.09065, over 20055.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01113, ecapa_loss=0.0001879, whisper_loss=0.09248, over 3826473.32 frames. ], batch size: 79, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:54:12,251 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 15:54:14,202 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2024-08-11 15:54:23,808 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 15:54:30,313 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2024-08-11 15:54:50,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1172680.0, ans=0.1 2024-08-11 15:54:52,236 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-08-11 15:54:53,025 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 11 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 15:54:53,723 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=12.0 2024-08-11 15:55:06,148 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2024-08-11 15:55:22,776 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1350, loss[loss=0.1027, beats_loss=0.01325, ecapa_loss=0.0001893, whisper_loss=0.08751, over 19424.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01118, ecapa_loss=0.000186, whisper_loss=0.09205, over 3802924.69 frames. ], batch size: 79, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:55:36,874 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 15:55:48,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1172980.0, ans=0.125 2024-08-11 15:55:51,496 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 2.580e+01 3.028e+01 3.578e+01 5.392e+01, threshold=6.056e+01, percent-clipped=0.0 2024-08-11 15:55:59,912 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 15:56:02,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2024-08-11 15:56:04,941 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 15:56:05,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2024-08-11 15:56:22,985 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 15:56:50,291 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1400, loss[loss=0.1122, beats_loss=0.01138, ecapa_loss=0.0001857, whisper_loss=0.09893, over 22922.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01122, ecapa_loss=0.0001854, whisper_loss=0.09153, over 3827558.60 frames. ], batch size: 89, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:57:01,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1173380.0, ans=0.1 2024-08-11 15:57:31,671 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=8.0 2024-08-11 15:58:03,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1173780.0, ans=0.125 2024-08-11 15:58:08,248 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-11 15:58:13,201 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1450, loss[loss=0.09002, beats_loss=0.009707, ecapa_loss=0.0001867, whisper_loss=0.07845, over 17463.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01131, ecapa_loss=0.0001836, whisper_loss=0.09068, over 3822170.33 frames. ], batch size: 69, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:59:09,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.169e+01 2.580e+01 2.876e+01 3.331e+01 4.704e+01, threshold=5.752e+01, percent-clipped=0.0 2024-08-11 15:59:13,195 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 15:59:20,013 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2024-08-11 15:59:25,060 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.92 vs. limit=15.0 2024-08-11 15:59:26,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1174080.0, ans=0.1 2024-08-11 15:59:34,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1174180.0, ans=0.07 2024-08-11 15:59:39,120 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 15:59:45,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1174180.0, ans=0.2 2024-08-11 16:00:06,246 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1500, loss[loss=0.1163, beats_loss=0.009474, ecapa_loss=0.0002358, whisper_loss=0.1045, over 20278.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01133, ecapa_loss=0.0001845, whisper_loss=0.09045, over 3820247.30 frames. ], batch size: 82, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:00:30,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1174480.0, ans=0.5 2024-08-11 16:00:39,376 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 16:00:58,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1174680.0, ans=0.2 2024-08-11 16:01:26,566 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1550, loss[loss=0.1048, beats_loss=0.009171, ecapa_loss=0.0002165, whisper_loss=0.09344, over 17970.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01136, ecapa_loss=0.0001846, whisper_loss=0.09031, over 3808946.80 frames. ], batch size: 71, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:01:33,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1174880.0, ans=0.0 2024-08-11 16:01:52,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.587e+01 2.923e+01 3.490e+01 5.175e+01, threshold=5.845e+01, percent-clipped=0.0 2024-08-11 16:01:59,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1175080.0, ans=0.0 2024-08-11 16:02:26,477 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2024-08-11 16:02:35,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1175280.0, ans=0.0 2024-08-11 16:02:36,436 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 16:02:40,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1175280.0, ans=0.125 2024-08-11 16:02:43,912 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1600, loss[loss=0.08417, beats_loss=0.01632, ecapa_loss=0.0001373, whisper_loss=0.06648, over 20967.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01129, ecapa_loss=0.0001849, whisper_loss=0.09016, over 3809409.61 frames. ], batch size: 86, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:02:44,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1175380.0, ans=0.125 2024-08-11 16:02:44,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1175380.0, ans=0.125 2024-08-11 16:02:46,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1175380.0, ans=0.125 2024-08-11 16:02:51,273 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2024-08-11 16:02:53,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1175380.0, ans=0.05 2024-08-11 16:02:54,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1175380.0, ans=0.125 2024-08-11 16:03:11,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1175480.0, ans=0.0 2024-08-11 16:03:16,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.28 vs. limit=15.0 2024-08-11 16:03:18,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1175580.0, ans=0.125 2024-08-11 16:03:21,155 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 16:03:32,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1175680.0, ans=0.0 2024-08-11 16:03:34,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1175680.0, ans=0.0 2024-08-11 16:03:36,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1175680.0, ans=0.125 2024-08-11 16:03:51,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1175780.0, ans=0.2 2024-08-11 16:03:52,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=12.0 2024-08-11 16:04:00,795 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1650, loss[loss=0.09701, beats_loss=0.0123, ecapa_loss=0.0002195, whisper_loss=0.08251, over 20819.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01123, ecapa_loss=0.0001851, whisper_loss=0.09119, over 3804184.07 frames. ], batch size: 87, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:04:01,790 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-11 16:04:12,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1175880.0, ans=0.125 2024-08-11 16:04:16,731 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-11 16:04:17,979 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-11 16:04:25,548 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.491e+01 2.765e+01 3.253e+01 5.216e+01, threshold=5.529e+01, percent-clipped=0.0 2024-08-11 16:04:31,801 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 16:04:33,263 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 16:04:35,038 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 16:04:40,643 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-08-11 16:04:54,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1176180.0, ans=0.1 2024-08-11 16:05:02,412 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 16:05:17,566 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1700, loss[loss=0.103, beats_loss=0.01004, ecapa_loss=0.0002101, whisper_loss=0.09086, over 15070.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01107, ecapa_loss=0.0001849, whisper_loss=0.09222, over 3798444.44 frames. ], batch size: 60, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:05:18,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1176380.0, ans=0.0 2024-08-11 16:05:27,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1176380.0, ans=0.125 2024-08-11 16:05:33,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1176480.0, ans=0.125 2024-08-11 16:05:49,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1176580.0, ans=0.2 2024-08-11 16:05:55,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1176580.0, ans=0.125 2024-08-11 16:05:57,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1176580.0, ans=0.125 2024-08-11 16:06:14,024 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-11 16:06:23,566 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-11 16:06:24,462 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2024-08-11 16:06:25,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1176780.0, ans=0.0 2024-08-11 16:06:30,643 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1750, loss[loss=0.09706, beats_loss=0.01134, ecapa_loss=0.0001798, whisper_loss=0.08392, over 15465.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01112, ecapa_loss=0.000184, whisper_loss=0.09261, over 3814543.31 frames. ], batch size: 61, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:06:31,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-11 16:06:35,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=22.5 2024-08-11 16:06:52,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1176980.0, ans=0.0 2024-08-11 16:06:54,133 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.634e+01 3.052e+01 3.436e+01 4.631e+01, threshold=6.105e+01, percent-clipped=0.0 2024-08-11 16:07:08,099 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 19 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-11 16:07:11,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1177080.0, ans=0.125 2024-08-11 16:07:13,953 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 15 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-11 16:07:15,617 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 16:07:18,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1177180.0, ans=0.09899494936611666 2024-08-11 16:07:18,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1177180.0, ans=0.125 2024-08-11 16:07:22,535 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 16:07:28,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1177280.0, ans=0.2 2024-08-11 16:07:31,905 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=12.0 2024-08-11 16:07:42,303 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1800, loss[loss=0.106, beats_loss=0.01193, ecapa_loss=0.0001744, whisper_loss=0.09233, over 17514.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0111, ecapa_loss=0.0001835, whisper_loss=0.0925, over 3805845.58 frames. ], batch size: 69, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:07:57,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1177480.0, ans=0.125 2024-08-11 16:07:57,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1177480.0, ans=0.05 2024-08-11 16:08:11,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1177580.0, ans=0.2 2024-08-11 16:08:12,437 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 27 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 16:08:18,722 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-11 16:08:21,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1177580.0, ans=0.0 2024-08-11 16:08:22,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.48 vs. limit=15.0 2024-08-11 16:08:38,653 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 11 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 16:08:54,325 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1850, loss[loss=0.1023, beats_loss=0.01031, ecapa_loss=0.000206, whisper_loss=0.08994, over 20037.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01105, ecapa_loss=0.0001847, whisper_loss=0.09251, over 3785295.11 frames. ], batch size: 77, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:09:11,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1177980.0, ans=0.0 2024-08-11 16:09:18,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.637e+01 3.046e+01 3.560e+01 5.616e+01, threshold=6.093e+01, percent-clipped=0.0 2024-08-11 16:09:34,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1178080.0, ans=0.0 2024-08-11 16:09:43,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1178180.0, ans=0.0 2024-08-11 16:09:50,345 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 16:10:07,684 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1900, loss[loss=0.1034, beats_loss=0.01464, ecapa_loss=0.0001577, whisper_loss=0.0872, over 17225.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01113, ecapa_loss=0.0001859, whisper_loss=0.0922, over 3803031.20 frames. ], batch size: 67, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:10:08,504 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.95 vs. limit=15.0 2024-08-11 16:10:12,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1178380.0, ans=0.125 2024-08-11 16:10:23,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1178480.0, ans=0.09899494936611666 2024-08-11 16:10:31,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1178480.0, ans=0.1 2024-08-11 16:10:32,391 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.314e-02 2024-08-11 16:10:34,748 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 16:10:39,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1178580.0, ans=0.0 2024-08-11 16:10:41,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1178580.0, ans=0.125 2024-08-11 16:10:43,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1178580.0, ans=0.2 2024-08-11 16:10:50,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1178580.0, ans=0.0 2024-08-11 16:10:51,886 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 16:11:09,311 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 16:11:16,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1178780.0, ans=0.125 2024-08-11 16:11:18,928 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 16:11:22,014 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 1950, loss[loss=0.1256, beats_loss=0.008859, ecapa_loss=0.0001963, whisper_loss=0.1147, over 15632.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01106, ecapa_loss=0.0001881, whisper_loss=0.09254, over 3767070.54 frames. ], batch size: 59, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:11:39,611 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 16:11:45,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.606e+01 2.950e+01 3.514e+01 8.174e+01, threshold=5.900e+01, percent-clipped=2.0 2024-08-11 16:11:58,549 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2024-08-11 16:12:12,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1179180.0, ans=0.125 2024-08-11 16:12:20,208 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 19 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-11 16:12:28,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1179280.0, ans=0.1 2024-08-11 16:12:31,152 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 16:12:36,774 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2000, loss[loss=0.09508, beats_loss=0.01251, ecapa_loss=0.0001952, whisper_loss=0.08061, over 23255.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01104, ecapa_loss=0.0001903, whisper_loss=0.09248, over 3794128.94 frames. ], batch size: 95, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:13:02,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1179480.0, ans=0.1 2024-08-11 16:13:04,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1179480.0, ans=0.125 2024-08-11 16:13:08,175 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 16:13:09,003 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2024-08-11 16:13:22,332 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 16:13:28,991 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.80 vs. limit=6.0 2024-08-11 16:13:29,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2024-08-11 16:13:37,389 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2024-08-11 16:13:44,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1179780.0, ans=0.0 2024-08-11 16:13:53,178 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2050, loss[loss=0.1006, beats_loss=0.01319, ecapa_loss=0.0001213, whisper_loss=0.08618, over 22599.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01104, ecapa_loss=0.0001908, whisper_loss=0.09246, over 3798985.67 frames. ], batch size: 84, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:13:58,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1179880.0, ans=0.1 2024-08-11 16:14:09,596 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 16:14:14,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1179980.0, ans=0.125 2024-08-11 16:14:18,620 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.671e+01 2.965e+01 3.227e+01 2.393e+02, threshold=5.931e+01, percent-clipped=1.0 2024-08-11 16:15:08,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1180280.0, ans=0.125 2024-08-11 16:15:14,641 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2100, loss[loss=0.06622, beats_loss=0.01291, ecapa_loss=0.0001798, whisper_loss=0.05151, over 13280.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01114, ecapa_loss=0.0001904, whisper_loss=0.09129, over 3744035.30 frames. ], batch size: 55, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:15:16,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1180380.0, ans=0.125 2024-08-11 16:16:04,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1180680.0, ans=0.0 2024-08-11 16:16:23,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1180780.0, ans=0.125 2024-08-11 16:16:24,285 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-08-11 16:16:37,807 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2150, loss[loss=0.0843, beats_loss=0.01305, ecapa_loss=0.0001391, whisper_loss=0.06986, over 16729.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01113, ecapa_loss=0.0001891, whisper_loss=0.09159, over 3744619.19 frames. ], batch size: 63, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:16:41,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1180880.0, ans=0.125 2024-08-11 16:16:49,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1180880.0, ans=0.125 2024-08-11 16:16:51,551 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.25 vs. limit=15.0 2024-08-11 16:16:57,203 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 16 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 16:17:00,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1180980.0, ans=0.125 2024-08-11 16:17:03,533 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.740e+01 2.984e+01 3.481e+01 5.761e+01, threshold=5.968e+01, percent-clipped=0.0 2024-08-11 16:17:14,302 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 32 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 16:17:21,951 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2024-08-11 16:17:22,873 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-11 16:17:46,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1181280.0, ans=0.1 2024-08-11 16:17:50,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2024-08-11 16:17:54,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1181280.0, ans=0.0 2024-08-11 16:17:56,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1181280.0, ans=0.125 2024-08-11 16:18:01,086 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 16:18:02,164 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2200, loss[loss=0.1031, beats_loss=0.01363, ecapa_loss=0.0002256, whisper_loss=0.08718, over 21165.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01128, ecapa_loss=0.0001884, whisper_loss=0.09176, over 3762439.36 frames. ], batch size: 92, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:18:02,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1181380.0, ans=0.125 2024-08-11 16:18:12,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1181380.0, ans=0.125 2024-08-11 16:18:13,755 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 16:18:15,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1181380.0, ans=0.0 2024-08-11 16:18:15,420 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.452e-01 2024-08-11 16:18:39,746 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2024-08-11 16:18:58,951 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.77 vs. limit=22.5 2024-08-11 16:19:03,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=1181680.0, ans=15.0 2024-08-11 16:19:10,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1181780.0, ans=0.125 2024-08-11 16:19:24,443 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2250, loss[loss=0.09381, beats_loss=0.01095, ecapa_loss=0.0001638, whisper_loss=0.08122, over 18621.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01134, ecapa_loss=0.0001899, whisper_loss=0.09251, over 3788292.27 frames. ], batch size: 71, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:19:36,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1181880.0, ans=0.2 2024-08-11 16:19:50,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.696e+01 3.022e+01 3.450e+01 8.988e+01, threshold=6.044e+01, percent-clipped=1.0 2024-08-11 16:19:57,776 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.94 vs. limit=15.0 2024-08-11 16:20:13,303 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.49 vs. limit=15.0 2024-08-11 16:20:14,164 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 16:20:20,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1182180.0, ans=0.015 2024-08-11 16:20:25,464 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 35 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 16:20:26,985 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 16:20:28,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1182280.0, ans=0.2 2024-08-11 16:20:45,163 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2300, loss[loss=0.1128, beats_loss=0.00998, ecapa_loss=0.0001958, whisper_loss=0.1009, over 21973.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01126, ecapa_loss=0.000191, whisper_loss=0.0928, over 3809412.27 frames. ], batch size: 88, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:20:48,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1182380.0, ans=0.1 2024-08-11 16:20:53,010 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 33 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 16:21:12,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1182480.0, ans=0.1 2024-08-11 16:21:16,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1182580.0, ans=0.125 2024-08-11 16:21:17,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1182580.0, ans=0.125 2024-08-11 16:21:29,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1182580.0, ans=0.05 2024-08-11 16:21:29,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1182580.0, ans=0.0 2024-08-11 16:21:53,996 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.810e+02 2024-08-11 16:21:58,955 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-11 16:22:05,730 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2350, loss[loss=0.07906, beats_loss=0.01637, ecapa_loss=0.0001864, whisper_loss=0.06083, over 17320.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01143, ecapa_loss=0.0001893, whisper_loss=0.09222, over 3858840.09 frames. ], batch size: 73, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:22:11,114 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.00 vs. limit=22.5 2024-08-11 16:22:18,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1182880.0, ans=0.5 2024-08-11 16:22:21,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1182980.0, ans=0.0 2024-08-11 16:22:23,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1182980.0, ans=0.125 2024-08-11 16:22:29,961 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 16:22:34,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.605e+01 2.959e+01 3.391e+01 6.517e+01, threshold=5.918e+01, percent-clipped=1.0 2024-08-11 16:22:44,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1183080.0, ans=0.0 2024-08-11 16:22:48,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1183080.0, ans=0.0 2024-08-11 16:22:57,829 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 16:23:11,099 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2024-08-11 16:23:21,753 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 16:23:27,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1183280.0, ans=0.2 2024-08-11 16:23:30,530 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2400, loss[loss=0.1087, beats_loss=0.01078, ecapa_loss=0.0002073, whisper_loss=0.09581, over 22235.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01125, ecapa_loss=0.000192, whisper_loss=0.09289, over 3863663.28 frames. ], batch size: 91, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:23:30,674 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 16:23:42,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1183380.0, ans=0.125 2024-08-11 16:23:46,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1183480.0, ans=0.125 2024-08-11 16:23:46,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1183480.0, ans=0.2 2024-08-11 16:24:07,236 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.94 vs. limit=22.5 2024-08-11 16:24:22,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1183680.0, ans=0.1 2024-08-11 16:24:23,151 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-11 16:24:34,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1183680.0, ans=0.125 2024-08-11 16:24:49,020 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 16:24:52,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1183780.0, ans=0.0 2024-08-11 16:24:54,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1183880.0, ans=0.125 2024-08-11 16:24:55,454 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2450, loss[loss=0.1213, beats_loss=0.008611, ecapa_loss=0.0001744, whisper_loss=0.111, over 21448.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01123, ecapa_loss=0.0001915, whisper_loss=0.09331, over 3875009.29 frames. ], batch size: 81, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:25:05,038 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 16:25:20,556 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.638e+01 2.982e+01 3.407e+01 5.711e+01, threshold=5.963e+01, percent-clipped=0.0 2024-08-11 16:25:35,878 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 16:25:38,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-11 16:25:50,829 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2024-08-11 16:26:08,270 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 16:26:11,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1184280.0, ans=0.2 2024-08-11 16:26:12,481 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 16:26:12,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1184280.0, ans=0.2 2024-08-11 16:26:14,470 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2024-08-11 16:26:18,235 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2500, loss[loss=0.08613, beats_loss=0.01351, ecapa_loss=0.0001969, whisper_loss=0.07065, over 17551.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01128, ecapa_loss=0.0001906, whisper_loss=0.09261, over 3890651.43 frames. ], batch size: 70, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:26:25,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1184380.0, ans=0.125 2024-08-11 16:26:28,964 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2024-08-11 16:26:32,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1184380.0, ans=0.0 2024-08-11 16:26:42,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1184480.0, ans=0.125 2024-08-11 16:27:09,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1184680.0, ans=0.0 2024-08-11 16:27:38,172 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 25 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-11 16:27:45,241 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2550, loss[loss=0.09876, beats_loss=0.01023, ecapa_loss=0.0001949, whisper_loss=0.08658, over 18002.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01115, ecapa_loss=0.0001902, whisper_loss=0.09334, over 3867027.63 frames. ], batch size: 75, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:27:50,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1184880.0, ans=0.1 2024-08-11 16:28:11,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.549e+01 2.871e+01 3.222e+01 4.395e+01, threshold=5.742e+01, percent-clipped=0.0 2024-08-11 16:28:18,077 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 31 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 16:28:31,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1185080.0, ans=0.0 2024-08-11 16:28:33,225 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 16:28:59,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1185280.0, ans=0.125 2024-08-11 16:29:03,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1185280.0, ans=0.0 2024-08-11 16:29:04,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=22.5 2024-08-11 16:29:10,409 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2600, loss[loss=0.1218, beats_loss=0.008594, ecapa_loss=0.0002257, whisper_loss=0.1109, over 19321.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01112, ecapa_loss=0.0001922, whisper_loss=0.09376, over 3870143.78 frames. ], batch size: 80, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:29:20,163 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 16:29:21,631 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 16:29:38,375 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 16:30:10,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1185680.0, ans=0.2 2024-08-11 16:30:15,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1185680.0, ans=0.125 2024-08-11 16:30:24,408 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 16:30:34,766 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2650, loss[loss=0.09093, beats_loss=0.01246, ecapa_loss=0.0001696, whisper_loss=0.07678, over 19423.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01116, ecapa_loss=0.000193, whisper_loss=0.09353, over 3890542.67 frames. ], batch size: 76, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:30:43,744 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 16:30:47,833 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 16:30:49,674 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 17 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-11 16:31:01,420 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.668e+01 2.978e+01 3.517e+01 4.989e+01, threshold=5.956e+01, percent-clipped=0.0 2024-08-11 16:31:51,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1186280.0, ans=0.1 2024-08-11 16:31:52,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1186280.0, ans=0.125 2024-08-11 16:31:59,007 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2700, loss[loss=0.09943, beats_loss=0.01128, ecapa_loss=0.0002103, whisper_loss=0.08604, over 19550.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01125, ecapa_loss=0.0001927, whisper_loss=0.09239, over 3875806.31 frames. ], batch size: 80, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:32:01,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1186380.0, ans=0.125 2024-08-11 16:32:07,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1186380.0, ans=0.125 2024-08-11 16:32:18,979 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 2024-08-11 16:32:19,730 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 16:32:50,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1186680.0, ans=0.0 2024-08-11 16:32:54,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1186680.0, ans=0.1 2024-08-11 16:33:06,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1186780.0, ans=0.0 2024-08-11 16:33:10,049 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 16:33:15,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1186780.0, ans=0.2 2024-08-11 16:33:16,708 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 27 from Vox, 20 fro AS 2024-08-11 16:33:20,272 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2750, loss[loss=0.08815, beats_loss=0.01139, ecapa_loss=0.0001641, whisper_loss=0.07512, over 15318.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01114, ecapa_loss=0.0001947, whisper_loss=0.09279, over 3845467.35 frames. ], batch size: 57, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:33:22,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1186880.0, ans=0.125 2024-08-11 16:33:29,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1186880.0, ans=0.0 2024-08-11 16:33:32,638 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=15.0 2024-08-11 16:33:35,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1186980.0, ans=0.035 2024-08-11 16:33:36,850 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 27 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-11 16:33:47,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.790e+01 3.167e+01 3.660e+01 5.593e+01, threshold=6.335e+01, percent-clipped=0.0 2024-08-11 16:33:53,251 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 16:34:20,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1187180.0, ans=0.0 2024-08-11 16:34:35,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1187280.0, ans=0.125 2024-08-11 16:34:42,604 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2800, loss[loss=0.08443, beats_loss=0.01613, ecapa_loss=0.0001552, whisper_loss=0.06675, over 13877.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01115, ecapa_loss=0.0001949, whisper_loss=0.09285, over 3841045.82 frames. ], batch size: 56, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:34:49,426 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 16:35:26,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1187580.0, ans=0.1 2024-08-11 16:35:57,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2024-08-11 16:36:04,562 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2850, loss[loss=0.1157, beats_loss=0.01035, ecapa_loss=0.0001942, whisper_loss=0.1034, over 17349.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01119, ecapa_loss=0.0001952, whisper_loss=0.09302, over 3856516.18 frames. ], batch size: 71, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:36:13,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1187880.0, ans=0.125 2024-08-11 16:36:30,392 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.367e+00 2024-08-11 16:36:31,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.618e+01 2.962e+01 3.443e+01 5.615e+01, threshold=5.924e+01, percent-clipped=0.0 2024-08-11 16:36:37,106 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.60 vs. limit=15.0 2024-08-11 16:36:38,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=1188080.0, ans=0.02 2024-08-11 16:36:53,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1188180.0, ans=0.125 2024-08-11 16:37:16,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1188280.0, ans=0.07 2024-08-11 16:37:18,492 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2024-08-11 16:37:28,193 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2900, loss[loss=0.09226, beats_loss=0.01435, ecapa_loss=0.0001492, whisper_loss=0.07642, over 23648.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01127, ecapa_loss=0.0001959, whisper_loss=0.09268, over 3848907.59 frames. ], batch size: 95, lr: 7.15e-03, grad_scale: 1.152921504606847e+18 2024-08-11 16:37:39,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1188380.0, ans=0.125 2024-08-11 16:37:43,303 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-08-11 16:37:59,429 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 16:38:18,806 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-08-11 16:38:25,037 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 16:38:25,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1188680.0, ans=0.125 2024-08-11 16:38:25,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1188680.0, ans=0.0 2024-08-11 16:38:37,485 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 16:38:40,139 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 17 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 16:38:41,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1188780.0, ans=0.0 2024-08-11 16:38:43,915 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 2950, loss[loss=0.1295, beats_loss=0.01068, ecapa_loss=0.0001967, whisper_loss=0.1169, over 19417.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01121, ecapa_loss=0.000197, whisper_loss=0.09349, over 3883224.95 frames. ], batch size: 78, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:38:46,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1188880.0, ans=0.125 2024-08-11 16:38:55,557 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=34.84 vs. limit=22.5 2024-08-11 16:39:03,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=1188980.0, ans=0.1 2024-08-11 16:39:04,287 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 16:39:06,567 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.714e+01 3.075e+01 3.561e+01 5.736e+01, threshold=6.149e+01, percent-clipped=0.0 2024-08-11 16:39:08,140 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 31 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 16:39:19,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1189080.0, ans=0.1 2024-08-11 16:39:22,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1189080.0, ans=0.125 2024-08-11 16:39:23,279 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 16:39:51,284 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3000, loss[loss=0.0914, beats_loss=0.01074, ecapa_loss=0.0002028, whisper_loss=0.07863, over 17203.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01128, ecapa_loss=0.0001957, whisper_loss=0.0935, over 3903049.97 frames. ], batch size: 69, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:39:51,285 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 16:40:32,749 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on ASR_libri: loss=0.2566, beats_loss=0, ecapa_loss=0.0006312, whisper_loss=0.2502, over 922467.00 frames. 2024-08-11 16:40:44,272 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.5729e-03, 1.0495e-02, 1.0683e-02, 2.9360e+00, 3.1350e-03, 2.0934e-02, 3.5178e-02, 4.0988e-03], device='cuda:1') 2024-08-11 16:40:50,031 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on SV_voxceleb1: loss=0.005299, beats_loss=0, ecapa_loss=0.0005299, whisper_loss=0, over 939242.00 frames. 2024-08-11 16:42:48,095 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on AT_audioset: loss=0.02498, beats_loss=0.02498, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 16:42:48,098 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 16:42:52,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1189380.0, ans=0.0 2024-08-11 16:43:01,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1189480.0, ans=0.125 2024-08-11 16:43:15,984 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 34 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 16:43:20,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1189580.0, ans=0.125 2024-08-11 16:43:22,866 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-11 16:43:23,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1189580.0, ans=0.125 2024-08-11 16:43:25,674 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-11 16:43:44,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1189780.0, ans=0.1 2024-08-11 16:43:44,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1189780.0, ans=0.0 2024-08-11 16:43:47,247 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 16:43:51,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1189780.0, ans=0.125 2024-08-11 16:43:54,657 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3050, loss[loss=0.1007, beats_loss=0.01353, ecapa_loss=0.0001957, whisper_loss=0.08526, over 22640.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01127, ecapa_loss=0.0001957, whisper_loss=0.09421, over 3923301.00 frames. ], batch size: 92, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:44:00,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-11 16:44:12,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1189980.0, ans=0.125 2024-08-11 16:44:16,832 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.646e+01 3.011e+01 3.406e+01 6.810e+01, threshold=6.022e+01, percent-clipped=0.0 2024-08-11 16:44:25,552 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-11 16:44:32,312 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-11 16:44:45,435 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 16:44:47,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1190280.0, ans=0.0 2024-08-11 16:44:52,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1190280.0, ans=0.125 2024-08-11 16:45:01,333 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3100, loss[loss=0.09374, beats_loss=0.009415, ecapa_loss=0.0002292, whisper_loss=0.08203, over 15449.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01126, ecapa_loss=0.0001974, whisper_loss=0.0934, over 3893168.14 frames. ], batch size: 60, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:45:13,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1190380.0, ans=0.125 2024-08-11 16:45:45,020 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=12.0 2024-08-11 16:45:47,332 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 20 from Vox, 14 fro AS 2024-08-11 16:46:08,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1190880.0, ans=0.125 2024-08-11 16:46:09,172 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3150, loss[loss=0.07632, beats_loss=0.01328, ecapa_loss=0.0001662, whisper_loss=0.06138, over 15468.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01127, ecapa_loss=0.0001961, whisper_loss=0.09367, over 3886752.41 frames. ], batch size: 61, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:46:24,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1190980.0, ans=0.125 2024-08-11 16:46:31,640 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.615e+01 2.879e+01 3.586e+01 1.580e+02, threshold=5.758e+01, percent-clipped=2.0 2024-08-11 16:46:40,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1191080.0, ans=0.125 2024-08-11 16:46:44,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1191080.0, ans=0.0 2024-08-11 16:46:45,207 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 21 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-11 16:46:51,544 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=15.0 2024-08-11 16:46:54,706 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 16:47:15,764 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3200, loss[loss=0.07709, beats_loss=0.01472, ecapa_loss=0.000188, whisper_loss=0.06049, over 21791.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01131, ecapa_loss=0.0001953, whisper_loss=0.09337, over 3899795.63 frames. ], batch size: 94, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:47:19,740 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 16:47:21,291 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 16:47:28,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1191480.0, ans=0.1 2024-08-11 16:47:29,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1191480.0, ans=0.0 2024-08-11 16:47:39,641 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=15.0 2024-08-11 16:47:52,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1191580.0, ans=0.125 2024-08-11 16:47:57,104 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2024-08-11 16:48:01,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1191680.0, ans=0.125 2024-08-11 16:48:07,680 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.44 vs. limit=15.0 2024-08-11 16:48:07,884 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.03 vs. limit=15.0 2024-08-11 16:48:14,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1191780.0, ans=0.125 2024-08-11 16:48:22,492 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3250, loss[loss=0.1037, beats_loss=0.01028, ecapa_loss=0.0002577, whisper_loss=0.0908, over 21831.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01129, ecapa_loss=0.0001943, whisper_loss=0.0934, over 3901359.34 frames. ], batch size: 95, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:48:27,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1191880.0, ans=0.0 2024-08-11 16:48:27,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1191880.0, ans=0.0 2024-08-11 16:48:27,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1191880.0, ans=0.0 2024-08-11 16:48:31,711 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2024-08-11 16:48:32,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1191880.0, ans=0.125 2024-08-11 16:48:34,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1191880.0, ans=0.1 2024-08-11 16:48:42,949 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 16:48:45,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.517e+01 2.867e+01 3.292e+01 6.213e+01, threshold=5.733e+01, percent-clipped=1.0 2024-08-11 16:48:51,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1192080.0, ans=0.1 2024-08-11 16:49:21,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1192280.0, ans=0.125 2024-08-11 16:49:22,178 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 16:49:29,645 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3300, loss[loss=0.0799, beats_loss=0.01373, ecapa_loss=0.0001662, whisper_loss=0.0645, over 13867.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01126, ecapa_loss=0.0001938, whisper_loss=0.0939, over 3892925.81 frames. ], batch size: 57, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:49:32,124 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.92 vs. limit=10.0 2024-08-11 16:49:45,880 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-11 16:50:05,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1192580.0, ans=0.035 2024-08-11 16:50:31,076 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2024-08-11 16:50:37,012 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3350, loss[loss=0.1092, beats_loss=0.01035, ecapa_loss=0.000191, whisper_loss=0.09695, over 16212.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01126, ecapa_loss=0.0001931, whisper_loss=0.09339, over 3877883.81 frames. ], batch size: 67, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:50:38,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1192880.0, ans=0.125 2024-08-11 16:50:38,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1192880.0, ans=0.125 2024-08-11 16:50:39,831 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 16:50:42,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1192880.0, ans=0.1 2024-08-11 16:50:46,252 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 16:50:55,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1192980.0, ans=0.2 2024-08-11 16:50:59,279 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.599e+01 2.933e+01 3.463e+01 7.726e+01, threshold=5.866e+01, percent-clipped=2.0 2024-08-11 16:51:17,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1193180.0, ans=0.125 2024-08-11 16:51:22,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1193180.0, ans=0.125 2024-08-11 16:51:28,368 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 18 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 16:51:35,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1193280.0, ans=15.0 2024-08-11 16:51:42,621 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3400, loss[loss=0.1151, beats_loss=0.01348, ecapa_loss=0.0001609, whisper_loss=0.09999, over 17500.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.0113, ecapa_loss=0.0001942, whisper_loss=0.0936, over 3876910.36 frames. ], batch size: 68, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:51:47,992 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 17 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-11 16:51:48,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1193380.0, ans=0.125 2024-08-11 16:51:53,173 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 16:51:54,648 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 16 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 16:51:54,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1193480.0, ans=0.0 2024-08-11 16:52:02,515 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-11 16:52:03,686 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 16:52:05,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1193480.0, ans=0.125 2024-08-11 16:52:08,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1193580.0, ans=0.125 2024-08-11 16:52:12,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1193580.0, ans=0.125 2024-08-11 16:52:15,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1193580.0, ans=0.125 2024-08-11 16:52:26,854 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-11 16:52:40,189 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 16:52:48,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1193880.0, ans=0.0 2024-08-11 16:52:48,948 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3450, loss[loss=0.08671, beats_loss=0.01208, ecapa_loss=0.0001806, whisper_loss=0.07282, over 20679.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01136, ecapa_loss=0.000194, whisper_loss=0.09282, over 3887585.54 frames. ], batch size: 84, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:52:52,773 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 16:52:54,742 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-11 16:52:56,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1193880.0, ans=0.1 2024-08-11 16:52:58,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1193880.0, ans=0.0 2024-08-11 16:52:59,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1193880.0, ans=0.0 2024-08-11 16:53:11,046 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.578e+01 2.987e+01 3.563e+01 4.797e+01, threshold=5.975e+01, percent-clipped=0.0 2024-08-11 16:53:15,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1194080.0, ans=0.5 2024-08-11 16:53:22,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1194080.0, ans=0.125 2024-08-11 16:53:24,933 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 16:53:25,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1194080.0, ans=0.09899494936611666 2024-08-11 16:53:42,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1194280.0, ans=0.2 2024-08-11 16:53:42,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1194280.0, ans=0.125 2024-08-11 16:53:42,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1194280.0, ans=0.0 2024-08-11 16:53:54,399 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3500, loss[loss=0.08976, beats_loss=0.01291, ecapa_loss=0.0001893, whisper_loss=0.07495, over 22101.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0114, ecapa_loss=0.0001927, whisper_loss=0.09266, over 3887436.49 frames. ], batch size: 89, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:53:54,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1194380.0, ans=0.125 2024-08-11 16:54:02,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1194380.0, ans=0.125 2024-08-11 16:54:05,335 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 16:54:20,408 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.35 vs. limit=10.0 2024-08-11 16:54:27,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1194580.0, ans=0.1 2024-08-11 16:54:32,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1194680.0, ans=0.125 2024-08-11 16:54:45,437 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 16:54:47,003 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 16:54:53,225 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 16:54:59,780 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2024-08-11 16:55:00,073 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3550, loss[loss=0.1191, beats_loss=0.01061, ecapa_loss=0.0001921, whisper_loss=0.1065, over 23470.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01126, ecapa_loss=0.0001928, whisper_loss=0.09359, over 3900152.43 frames. ], batch size: 90, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:55:00,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1194880.0, ans=15.0 2024-08-11 16:55:22,880 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.768e+01 2.986e+01 3.532e+01 5.359e+01, threshold=5.971e+01, percent-clipped=0.0 2024-08-11 16:55:24,463 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 16:55:27,231 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 16:55:29,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1195080.0, ans=0.125 2024-08-11 16:55:41,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1195180.0, ans=0.0 2024-08-11 16:55:42,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1195180.0, ans=0.125 2024-08-11 16:55:42,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1195180.0, ans=0.125 2024-08-11 16:55:42,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1195180.0, ans=0.07 2024-08-11 16:55:44,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1195180.0, ans=0.0 2024-08-11 16:55:45,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-11 16:56:07,236 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3600, loss[loss=0.1211, beats_loss=0.01058, ecapa_loss=0.0002042, whisper_loss=0.1085, over 23479.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01124, ecapa_loss=0.0001937, whisper_loss=0.09337, over 3907196.66 frames. ], batch size: 91, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:56:08,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1195380.0, ans=0.125 2024-08-11 16:56:11,952 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2024-08-11 16:56:19,759 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 16:56:20,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1195480.0, ans=0.125 2024-08-11 16:56:55,599 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 16:57:13,822 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3650, loss[loss=0.1272, beats_loss=0.009003, ecapa_loss=0.0002309, whisper_loss=0.1159, over 18332.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01123, ecapa_loss=0.0001939, whisper_loss=0.09319, over 3863782.65 frames. ], batch size: 76, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:57:14,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1195880.0, ans=0.0 2024-08-11 16:57:19,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1195880.0, ans=0.1 2024-08-11 16:57:34,959 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 16:57:36,255 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.649e+01 3.037e+01 3.697e+01 5.413e+01, threshold=6.074e+01, percent-clipped=0.0 2024-08-11 16:57:43,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1196080.0, ans=0.0 2024-08-11 16:57:47,428 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 16:58:03,987 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 16:58:05,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1196180.0, ans=0.1 2024-08-11 16:58:16,087 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 16:58:16,804 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-08-11 16:58:21,143 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3700, loss[loss=0.1265, beats_loss=0.01162, ecapa_loss=0.0001814, whisper_loss=0.1131, over 23154.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01118, ecapa_loss=0.0001944, whisper_loss=0.09319, over 3851714.83 frames. ], batch size: 92, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:58:29,203 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=22.5 2024-08-11 16:58:49,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1196580.0, ans=0.125 2024-08-11 16:58:51,464 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=15.0 2024-08-11 16:58:54,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1196580.0, ans=0.125 2024-08-11 16:59:01,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1196680.0, ans=0.125 2024-08-11 16:59:03,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1196680.0, ans=0.125 2024-08-11 16:59:08,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1196680.0, ans=0.125 2024-08-11 16:59:12,148 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 16:59:19,017 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.30 vs. limit=12.0 2024-08-11 16:59:27,705 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3750, loss[loss=0.09052, beats_loss=0.01264, ecapa_loss=0.0001863, whisper_loss=0.07601, over 21682.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01132, ecapa_loss=0.0001927, whisper_loss=0.09221, over 3853195.35 frames. ], batch size: 89, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:59:27,897 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 16:59:30,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1196880.0, ans=0.1 2024-08-11 16:59:39,697 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-11 16:59:44,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1196980.0, ans=0.0 2024-08-11 16:59:47,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1196980.0, ans=0.125 2024-08-11 16:59:50,667 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.626e+01 2.806e+01 3.237e+01 4.971e+01, threshold=5.612e+01, percent-clipped=0.0 2024-08-11 17:00:02,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1197080.0, ans=0.125 2024-08-11 17:00:19,607 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 32 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 17:00:31,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=22.5 2024-08-11 17:00:34,269 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3800, loss[loss=0.08316, beats_loss=0.01368, ecapa_loss=0.0001788, whisper_loss=0.06769, over 13626.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01144, ecapa_loss=0.0001938, whisper_loss=0.09133, over 3874966.84 frames. ], batch size: 55, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:00:34,484 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 17:00:37,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1197380.0, ans=0.125 2024-08-11 17:00:38,180 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 17:01:02,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1197580.0, ans=0.0 2024-08-11 17:01:12,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1197580.0, ans=0.125 2024-08-11 17:01:13,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1197680.0, ans=0.04949747468305833 2024-08-11 17:01:20,829 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 17:01:26,675 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 17:01:28,578 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.77 vs. limit=22.5 2024-08-11 17:01:39,122 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2024-08-11 17:01:40,852 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3850, loss[loss=0.09918, beats_loss=0.01235, ecapa_loss=0.0002044, whisper_loss=0.08479, over 17667.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01146, ecapa_loss=0.0001938, whisper_loss=0.09155, over 3886967.10 frames. ], batch size: 72, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:01:46,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1197880.0, ans=0.125 2024-08-11 17:02:03,656 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.720e+01 3.010e+01 3.419e+01 7.200e+01, threshold=6.020e+01, percent-clipped=2.0 2024-08-11 17:02:08,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1198080.0, ans=0.0 2024-08-11 17:02:21,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1198180.0, ans=0.1 2024-08-11 17:02:29,282 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 17:02:36,849 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 17:02:47,634 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3900, loss[loss=0.1084, beats_loss=0.01268, ecapa_loss=0.0002072, whisper_loss=0.09368, over 16094.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01142, ecapa_loss=0.0001943, whisper_loss=0.09256, over 3877577.34 frames. ], batch size: 66, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:02:55,403 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 17:03:25,375 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-11 17:03:28,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1198680.0, ans=0.5 2024-08-11 17:03:33,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1198680.0, ans=0.125 2024-08-11 17:03:40,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1198780.0, ans=0.2 2024-08-11 17:03:50,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1198780.0, ans=0.125 2024-08-11 17:03:53,679 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 3950, loss[loss=0.1149, beats_loss=0.008571, ecapa_loss=0.0002323, whisper_loss=0.104, over 22223.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01133, ecapa_loss=0.0001959, whisper_loss=0.09255, over 3875234.63 frames. ], batch size: 91, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:04:11,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1198980.0, ans=0.0 2024-08-11 17:04:14,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1198980.0, ans=0.04949747468305833 2024-08-11 17:04:15,579 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 2.737e+01 3.009e+01 3.546e+01 1.155e+02, threshold=6.019e+01, percent-clipped=1.0 2024-08-11 17:04:21,019 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 17:04:22,849 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.57 vs. limit=10.0 2024-08-11 17:04:23,778 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 17:04:27,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1199080.0, ans=0.1 2024-08-11 17:04:28,354 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 15 from Vox, 52 fro AS 2024-08-11 17:04:39,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1199180.0, ans=0.0 2024-08-11 17:04:45,891 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-11 17:04:46,487 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.79 vs. limit=22.5 2024-08-11 17:04:48,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1199280.0, ans=0.125 2024-08-11 17:04:59,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1199380.0, ans=0.125 2024-08-11 17:05:00,923 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4000, loss[loss=0.0898, beats_loss=0.01452, ecapa_loss=0.0001645, whisper_loss=0.07363, over 22574.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01132, ecapa_loss=0.0001955, whisper_loss=0.09283, over 3880209.61 frames. ], batch size: 92, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:05:37,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1199580.0, ans=0.0 2024-08-11 17:05:50,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1199680.0, ans=0.125 2024-08-11 17:06:01,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1199780.0, ans=0.0 2024-08-11 17:06:11,433 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4050, loss[loss=0.1116, beats_loss=0.01129, ecapa_loss=0.0001766, whisper_loss=0.09858, over 23876.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01124, ecapa_loss=0.0001954, whisper_loss=0.09344, over 3901793.57 frames. ], batch size: 95, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:06:33,370 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 17:06:37,267 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.884e+01 3.098e+01 3.625e+01 5.878e+01, threshold=6.196e+01, percent-clipped=0.0 2024-08-11 17:06:41,762 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 17:06:58,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1200180.0, ans=0.125 2024-08-11 17:06:59,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1200180.0, ans=0.125 2024-08-11 17:07:02,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1200180.0, ans=0.0 2024-08-11 17:07:23,091 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4100, loss[loss=0.08933, beats_loss=0.01159, ecapa_loss=0.0002306, whisper_loss=0.07543, over 14772.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01123, ecapa_loss=0.000196, whisper_loss=0.09337, over 3880027.86 frames. ], batch size: 61, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:07:49,621 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 17:07:52,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1200580.0, ans=0.1 2024-08-11 17:07:56,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1200580.0, ans=0.0 2024-08-11 17:08:10,768 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 29 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-11 17:08:17,599 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 17:08:21,964 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 17:08:30,018 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 17:08:32,521 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4150, loss[loss=0.1007, beats_loss=0.01208, ecapa_loss=0.0002163, whisper_loss=0.08647, over 21140.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.0112, ecapa_loss=0.000196, whisper_loss=0.09389, over 3915272.15 frames. ], batch size: 86, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:08:32,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1200880.0, ans=0.1 2024-08-11 17:08:39,105 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-11 17:08:55,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.681e+01 3.148e+01 3.707e+01 5.413e+01, threshold=6.297e+01, percent-clipped=0.0 2024-08-11 17:09:42,980 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4200, loss[loss=0.1051, beats_loss=0.01407, ecapa_loss=0.0001712, whisper_loss=0.08928, over 20781.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01125, ecapa_loss=0.0001973, whisper_loss=0.09285, over 3885500.87 frames. ], batch size: 83, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:09:50,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1201380.0, ans=0.125 2024-08-11 17:09:51,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1201380.0, ans=0.0 2024-08-11 17:10:01,118 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 17:10:10,629 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.28 vs. limit=22.5 2024-08-11 17:10:16,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1201580.0, ans=0.125 2024-08-11 17:10:20,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1201580.0, ans=0.1 2024-08-11 17:10:33,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1201680.0, ans=0.2 2024-08-11 17:10:48,867 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 31 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 17:10:50,267 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 38 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-11 17:10:52,703 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4250, loss[loss=0.104, beats_loss=0.009919, ecapa_loss=0.0002076, whisper_loss=0.09205, over 21233.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01122, ecapa_loss=0.0001968, whisper_loss=0.09327, over 3899189.78 frames. ], batch size: 90, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:10:54,104 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 28 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 17:10:56,817 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 26 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 17:11:12,615 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 22 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-11 17:11:16,324 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.599e+01 2.986e+01 3.415e+01 8.403e+01, threshold=5.972e+01, percent-clipped=2.0 2024-08-11 17:11:26,683 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:11:42,246 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 17:11:42,902 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=15.0 2024-08-11 17:11:43,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1202180.0, ans=0.0 2024-08-11 17:11:48,700 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 11 from Vox, 53 fro AS 2024-08-11 17:12:01,426 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4300, loss[loss=0.1073, beats_loss=0.01223, ecapa_loss=0.0002034, whisper_loss=0.09308, over 18953.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01119, ecapa_loss=0.000196, whisper_loss=0.09343, over 3868596.67 frames. ], batch size: 77, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:12:26,115 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 17:12:38,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1202580.0, ans=0.0 2024-08-11 17:12:49,118 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 31 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 17:12:53,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1202680.0, ans=0.2 2024-08-11 17:12:55,700 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.86 vs. limit=10.0 2024-08-11 17:12:58,003 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 17:13:05,410 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 35 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 17:13:07,268 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 17:13:10,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1202880.0, ans=0.125 2024-08-11 17:13:11,054 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4350, loss[loss=0.1024, beats_loss=0.009248, ecapa_loss=0.0002379, whisper_loss=0.09082, over 19511.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01111, ecapa_loss=0.0001965, whisper_loss=0.09377, over 3853729.12 frames. ], batch size: 85, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:13:19,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1202880.0, ans=0.125 2024-08-11 17:13:23,772 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 17:13:29,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1202980.0, ans=0.09899494936611666 2024-08-11 17:13:32,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1202980.0, ans=0.125 2024-08-11 17:13:35,512 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.556e+01 3.068e+01 3.501e+01 5.955e+01, threshold=6.137e+01, percent-clipped=0.0 2024-08-11 17:13:48,243 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 17:13:49,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1203080.0, ans=0.1 2024-08-11 17:14:03,401 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 22 from LS+wenet, 30 from Vox, 44 fro AS 2024-08-11 17:14:05,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1203180.0, ans=0.125 2024-08-11 17:14:09,053 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 17:14:21,396 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4400, loss[loss=0.1119, beats_loss=0.0105, ecapa_loss=0.0001617, whisper_loss=0.09975, over 15788.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01119, ecapa_loss=0.0001961, whisper_loss=0.09328, over 3869912.36 frames. ], batch size: 59, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:14:24,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1203380.0, ans=0.125 2024-08-11 17:14:34,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1203480.0, ans=22.5 2024-08-11 17:14:47,708 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-11 17:15:02,430 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 17:15:34,266 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4450, loss[loss=0.09119, beats_loss=0.0137, ecapa_loss=0.0001672, whisper_loss=0.07582, over 21885.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01118, ecapa_loss=0.0001948, whisper_loss=0.09308, over 3849551.00 frames. ], batch size: 90, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:16:00,871 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 17:16:02,175 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.738e+01 3.141e+01 3.648e+01 6.257e+01, threshold=6.281e+01, percent-clipped=1.0 2024-08-11 17:16:08,990 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 17:16:15,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2024-08-11 17:16:18,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1204080.0, ans=0.0 2024-08-11 17:16:19,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1204080.0, ans=0.125 2024-08-11 17:16:24,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1204180.0, ans=0.2 2024-08-11 17:16:25,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1204180.0, ans=0.125 2024-08-11 17:16:26,249 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2024-08-11 17:16:54,228 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4500, loss[loss=0.1236, beats_loss=0.006342, ecapa_loss=0.000242, whisper_loss=0.1149, over 15324.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01117, ecapa_loss=0.0001941, whisper_loss=0.09256, over 3824793.86 frames. ], batch size: 60, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:17:02,423 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:17:04,222 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 17:17:12,903 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.251e+00 2024-08-11 17:17:34,747 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.60 vs. limit=22.5 2024-08-11 17:17:44,516 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-11 17:17:51,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1204680.0, ans=0.0 2024-08-11 17:18:05,499 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-11 17:18:07,096 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 17:18:17,078 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4550, loss[loss=0.0894, beats_loss=0.01207, ecapa_loss=0.0001818, whisper_loss=0.0755, over 17517.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01114, ecapa_loss=0.0001955, whisper_loss=0.09302, over 3846658.40 frames. ], batch size: 71, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:18:18,823 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 17:18:23,669 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 17:18:35,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1204980.0, ans=0.1 2024-08-11 17:18:38,063 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-11 17:18:44,908 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.743e+01 3.155e+01 3.839e+01 5.758e+01, threshold=6.310e+01, percent-clipped=0.0 2024-08-11 17:18:51,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1205080.0, ans=0.1 2024-08-11 17:19:02,969 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 17:19:03,688 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-08-11 17:19:10,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1205180.0, ans=0.0 2024-08-11 17:19:15,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1205180.0, ans=0.125 2024-08-11 17:19:15,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1205180.0, ans=0.0 2024-08-11 17:19:15,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1205180.0, ans=0.05 2024-08-11 17:19:24,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1205280.0, ans=0.0 2024-08-11 17:19:26,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1205280.0, ans=0.025 2024-08-11 17:19:34,382 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4600, loss[loss=0.1224, beats_loss=0.01086, ecapa_loss=0.0001777, whisper_loss=0.1097, over 19925.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01112, ecapa_loss=0.0001943, whisper_loss=0.0935, over 3838549.12 frames. ], batch size: 77, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:19:37,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1205380.0, ans=0.2 2024-08-11 17:19:39,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1205380.0, ans=0.2 2024-08-11 17:19:48,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1205480.0, ans=15.0 2024-08-11 17:19:53,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1205480.0, ans=0.0 2024-08-11 17:19:54,196 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 17:20:29,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1205680.0, ans=0.125 2024-08-11 17:20:52,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1205880.0, ans=0.125 2024-08-11 17:20:54,445 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4650, loss[loss=0.1395, beats_loss=0.009684, ecapa_loss=0.0002141, whisper_loss=0.1277, over 22364.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01114, ecapa_loss=0.0001947, whisper_loss=0.09336, over 3863752.72 frames. ], batch size: 88, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:21:08,152 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=22.5 2024-08-11 17:21:09,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1205880.0, ans=0.125 2024-08-11 17:21:13,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1205980.0, ans=0.0 2024-08-11 17:21:15,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1205980.0, ans=0.125 2024-08-11 17:21:23,512 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.651e+01 2.897e+01 3.330e+01 4.454e+01, threshold=5.794e+01, percent-clipped=0.0 2024-08-11 17:21:32,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1206080.0, ans=0.125 2024-08-11 17:21:37,577 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 17:21:39,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1206080.0, ans=0.5 2024-08-11 17:21:47,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1206180.0, ans=0.125 2024-08-11 17:21:51,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1206180.0, ans=0.125 2024-08-11 17:21:55,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1206180.0, ans=0.1 2024-08-11 17:21:58,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1206280.0, ans=0.125 2024-08-11 17:22:12,742 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4700, loss[loss=0.12, beats_loss=0.009963, ecapa_loss=0.0001732, whisper_loss=0.1083, over 23594.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01114, ecapa_loss=0.0001938, whisper_loss=0.09426, over 3868679.42 frames. ], batch size: 91, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:22:18,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1206380.0, ans=0.125 2024-08-11 17:22:33,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1206480.0, ans=0.125 2024-08-11 17:22:46,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1206580.0, ans=0.0 2024-08-11 17:22:54,031 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-11 17:23:08,744 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 17:23:13,880 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 17:23:19,356 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4750, loss[loss=0.09751, beats_loss=0.01046, ecapa_loss=0.0001965, whisper_loss=0.08509, over 18627.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01119, ecapa_loss=0.0001925, whisper_loss=0.09433, over 3879412.93 frames. ], batch size: 74, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:23:20,912 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 17:23:23,253 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2024-08-11 17:23:27,822 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 17:23:37,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1206980.0, ans=0.1 2024-08-11 17:23:41,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1206980.0, ans=0.125 2024-08-11 17:23:42,243 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.773e+01 3.300e+01 3.701e+01 2.356e+02, threshold=6.600e+01, percent-clipped=1.0 2024-08-11 17:23:50,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1207080.0, ans=0.2 2024-08-11 17:24:01,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1207180.0, ans=0.0 2024-08-11 17:24:09,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1207180.0, ans=0.125 2024-08-11 17:24:15,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1207280.0, ans=0.0 2024-08-11 17:24:17,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1207280.0, ans=0.1 2024-08-11 17:24:20,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1207280.0, ans=0.125 2024-08-11 17:24:23,050 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=12.0 2024-08-11 17:24:26,106 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4800, loss[loss=0.1136, beats_loss=0.009951, ecapa_loss=0.0002032, whisper_loss=0.1016, over 22351.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01126, ecapa_loss=0.000195, whisper_loss=0.09375, over 3921788.53 frames. ], batch size: 89, lr: 7.09e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:24:26,221 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-11 17:24:30,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1207380.0, ans=0.015 2024-08-11 17:24:37,285 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=15.0 2024-08-11 17:24:49,403 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2024-08-11 17:24:50,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1207480.0, ans=0.0 2024-08-11 17:25:03,033 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2024-08-11 17:25:10,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1207680.0, ans=0.1 2024-08-11 17:25:13,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-11 17:25:18,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1207780.0, ans=0.5 2024-08-11 17:25:28,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1207780.0, ans=0.0 2024-08-11 17:25:32,939 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4850, loss[loss=0.1371, beats_loss=0.008658, ecapa_loss=0.0002117, whisper_loss=0.1263, over 17208.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01132, ecapa_loss=0.0001939, whisper_loss=0.09385, over 3945946.70 frames. ], batch size: 66, lr: 7.09e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:25:44,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1207880.0, ans=0.1 2024-08-11 17:25:45,497 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:25:55,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.586e+01 2.829e+01 3.279e+01 4.850e+01, threshold=5.658e+01, percent-clipped=0.0 2024-08-11 17:26:13,562 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.963e+00 2024-08-11 17:26:14,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1208180.0, ans=0.125 2024-08-11 17:26:25,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1208280.0, ans=0.125 2024-08-11 17:26:33,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1208280.0, ans=0.125 2024-08-11 17:26:33,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1208280.0, ans=0.0 2024-08-11 17:26:33,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1208280.0, ans=0.1 2024-08-11 17:26:34,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1208280.0, ans=0.125 2024-08-11 17:26:39,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4900, loss[loss=0.09975, beats_loss=0.01198, ecapa_loss=0.0001602, whisper_loss=0.08617, over 14502.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01128, ecapa_loss=0.0001933, whisper_loss=0.09396, over 3923829.84 frames. ], batch size: 54, lr: 7.09e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:26:44,416 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=15.0 2024-08-11 17:27:13,473 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=12.0 2024-08-11 17:27:44,922 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.27 vs. limit=15.0 2024-08-11 17:27:47,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1208780.0, ans=0.125 2024-08-11 17:27:50,154 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 4950, loss[loss=0.09573, beats_loss=0.01219, ecapa_loss=0.0001728, whisper_loss=0.0818, over 22448.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01122, ecapa_loss=0.000194, whisper_loss=0.09281, over 3857374.30 frames. ], batch size: 91, lr: 7.09e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:27:52,456 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 17:28:05,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1208980.0, ans=0.125 2024-08-11 17:28:10,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2024-08-11 17:28:15,175 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.567e+01 2.832e+01 3.214e+01 4.886e+01, threshold=5.664e+01, percent-clipped=0.0 2024-08-11 17:28:18,196 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-11 17:28:32,070 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 17:28:44,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1209180.0, ans=0.125 2024-08-11 17:29:04,893 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5000, loss[loss=0.09377, beats_loss=0.00848, ecapa_loss=0.0002425, whisper_loss=0.08286, over 13936.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01123, ecapa_loss=0.000196, whisper_loss=0.09318, over 3861216.94 frames. ], batch size: 55, lr: 7.09e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:29:08,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1209380.0, ans=0.125 2024-08-11 17:29:09,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1209380.0, ans=0.125 2024-08-11 17:29:24,418 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 17:29:32,368 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-11 17:29:38,887 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-11 17:29:42,578 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.23 vs. limit=15.0 2024-08-11 17:30:00,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1209680.0, ans=0.0 2024-08-11 17:30:05,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1209780.0, ans=0.125 2024-08-11 17:30:06,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1209780.0, ans=0.025 2024-08-11 17:30:08,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1209780.0, ans=0.0 2024-08-11 17:30:12,243 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 17:30:18,976 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-11 17:30:19,236 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5050, loss[loss=0.0967, beats_loss=0.01184, ecapa_loss=0.0002053, whisper_loss=0.0828, over 17578.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01131, ecapa_loss=0.0001951, whisper_loss=0.09286, over 3857600.02 frames. ], batch size: 71, lr: 7.09e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:30:26,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1209880.0, ans=0.04949747468305833 2024-08-11 17:30:27,510 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 17:30:34,588 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:30:44,621 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.569e+01 2.847e+01 3.482e+01 7.100e+01, threshold=5.695e+01, percent-clipped=3.0 2024-08-11 17:30:46,702 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 17:31:23,170 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 17:31:30,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2024-08-11 17:31:35,323 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5100, loss[loss=0.1076, beats_loss=0.01232, ecapa_loss=0.0001801, whisper_loss=0.09348, over 22498.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01139, ecapa_loss=0.0001929, whisper_loss=0.09209, over 3862329.28 frames. ], batch size: 93, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:31:42,681 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 17:32:31,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1210680.0, ans=0.125 2024-08-11 17:32:55,110 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5150, loss[loss=0.1253, beats_loss=0.00793, ecapa_loss=0.0001949, whisper_loss=0.1154, over 19740.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01138, ecapa_loss=0.0001919, whisper_loss=0.09254, over 3865083.41 frames. ], batch size: 77, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:33:08,090 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-11 17:33:22,089 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.614e+01 3.081e+01 3.730e+01 5.554e+01, threshold=6.161e+01, percent-clipped=0.0 2024-08-11 17:33:33,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1211080.0, ans=0.1 2024-08-11 17:33:36,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1211080.0, ans=0.1 2024-08-11 17:34:07,532 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 17:34:11,834 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5200, loss[loss=0.09674, beats_loss=0.01254, ecapa_loss=0.0001604, whisper_loss=0.08259, over 22482.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01135, ecapa_loss=0.0001912, whisper_loss=0.09252, over 3855084.40 frames. ], batch size: 90, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:34:25,117 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 17:34:33,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1211480.0, ans=0.125 2024-08-11 17:34:43,355 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-11 17:34:44,957 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 17:34:50,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1211580.0, ans=0.125 2024-08-11 17:35:05,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1211680.0, ans=0.125 2024-08-11 17:35:09,038 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 17:35:20,611 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 17:35:29,836 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5250, loss[loss=0.1081, beats_loss=0.008495, ecapa_loss=0.0002979, whisper_loss=0.09664, over 15680.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01123, ecapa_loss=0.0001924, whisper_loss=0.0924, over 3858207.44 frames. ], batch size: 68, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:35:38,030 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2024-08-11 17:35:48,647 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.75 vs. limit=15.0 2024-08-11 17:35:51,114 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 17:35:57,401 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.672e+01 3.061e+01 3.448e+01 6.321e+01, threshold=6.122e+01, percent-clipped=2.0 2024-08-11 17:36:12,301 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-11 17:36:18,690 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 17:36:39,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1212280.0, ans=0.07 2024-08-11 17:36:46,605 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 17:36:46,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1212380.0, ans=0.025 2024-08-11 17:36:48,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5300, loss[loss=0.1148, beats_loss=0.009913, ecapa_loss=0.0002022, whisper_loss=0.1028, over 17638.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01123, ecapa_loss=0.000192, whisper_loss=0.09244, over 3873037.37 frames. ], batch size: 69, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:36:52,672 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 17:37:04,416 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 17:37:07,904 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-11 17:37:16,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1212480.0, ans=0.07 2024-08-11 17:37:17,216 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 17:37:29,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1212580.0, ans=0.125 2024-08-11 17:37:32,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1212680.0, ans=0.0 2024-08-11 17:37:58,733 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 11 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 17:38:02,759 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 17:38:04,609 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5350, loss[loss=0.1108, beats_loss=0.01082, ecapa_loss=0.0001789, whisper_loss=0.09818, over 22354.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01124, ecapa_loss=0.0001906, whisper_loss=0.0922, over 3844767.32 frames. ], batch size: 89, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:38:09,693 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=22.5 2024-08-11 17:38:09,761 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.24 vs. limit=22.5 2024-08-11 17:38:15,398 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-11 17:38:15,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1212880.0, ans=0.125 2024-08-11 17:38:16,673 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 17:38:25,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1212980.0, ans=0.125 2024-08-11 17:38:25,854 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.10 vs. limit=22.5 2024-08-11 17:38:29,925 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.524e+01 2.904e+01 3.271e+01 6.276e+01, threshold=5.808e+01, percent-clipped=1.0 2024-08-11 17:38:37,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1213080.0, ans=0.1 2024-08-11 17:38:41,082 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 19 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-11 17:38:49,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1213180.0, ans=0.2 2024-08-11 17:39:12,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1213280.0, ans=0.2 2024-08-11 17:39:13,160 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 17:39:25,635 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5400, loss[loss=0.09292, beats_loss=0.01575, ecapa_loss=0.0001518, whisper_loss=0.07565, over 17655.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01117, ecapa_loss=0.0001911, whisper_loss=0.09331, over 3847457.48 frames. ], batch size: 72, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:39:26,194 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.657e-01 2024-08-11 17:39:26,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1213380.0, ans=0.125 2024-08-11 17:39:39,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1213380.0, ans=0.0 2024-08-11 17:39:41,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1213480.0, ans=0.125 2024-08-11 17:39:55,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1213480.0, ans=0.0 2024-08-11 17:40:21,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1213680.0, ans=0.0 2024-08-11 17:40:34,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1213780.0, ans=0.125 2024-08-11 17:40:44,027 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5450, loss[loss=0.09098, beats_loss=0.01294, ecapa_loss=0.0001769, whisper_loss=0.07627, over 21890.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01115, ecapa_loss=0.0001914, whisper_loss=0.09393, over 3857192.21 frames. ], batch size: 86, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:41:00,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1213980.0, ans=0.125 2024-08-11 17:41:11,875 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+01 2.638e+01 2.966e+01 3.379e+01 5.199e+01, threshold=5.933e+01, percent-clipped=0.0 2024-08-11 17:41:18,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-11 17:41:19,476 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 17:41:20,223 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.56 vs. limit=22.5 2024-08-11 17:41:42,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1214180.0, ans=0.0 2024-08-11 17:41:47,420 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 17:41:52,970 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.78 vs. limit=5.0 2024-08-11 17:41:56,113 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.58 vs. limit=15.0 2024-08-11 17:42:03,415 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5500, loss[loss=0.0773, beats_loss=0.01061, ecapa_loss=0.0002387, whisper_loss=0.06431, over 17137.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01122, ecapa_loss=0.0001903, whisper_loss=0.09345, over 3869823.32 frames. ], batch size: 74, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:42:03,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1214380.0, ans=0.05 2024-08-11 17:42:05,443 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.16 vs. limit=22.5 2024-08-11 17:42:10,970 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 17:42:36,246 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 17:42:39,769 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 25 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 17:42:43,771 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-11 17:42:43,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1214580.0, ans=0.125 2024-08-11 17:42:44,266 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.71 vs. limit=22.5 2024-08-11 17:42:44,327 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.55 vs. limit=10.0 2024-08-11 17:43:06,752 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 16 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-11 17:43:06,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1214680.0, ans=0.125 2024-08-11 17:43:21,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1214780.0, ans=22.5 2024-08-11 17:43:21,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1214780.0, ans=10.0 2024-08-11 17:43:25,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5550, loss[loss=0.1007, beats_loss=0.01035, ecapa_loss=0.0001768, whisper_loss=0.08858, over 23202.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01126, ecapa_loss=0.0001907, whisper_loss=0.09354, over 3898887.07 frames. ], batch size: 94, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:43:39,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1214880.0, ans=0.125 2024-08-11 17:43:53,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.662e+01 2.905e+01 3.480e+01 6.680e+01, threshold=5.810e+01, percent-clipped=1.0 2024-08-11 17:43:56,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1214980.0, ans=0.0 2024-08-11 17:44:07,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1215080.0, ans=0.125 2024-08-11 17:44:10,923 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.96 vs. limit=22.5 2024-08-11 17:44:15,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1215180.0, ans=0.125 2024-08-11 17:44:18,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1215180.0, ans=0.0 2024-08-11 17:44:25,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1215180.0, ans=0.125 2024-08-11 17:44:28,285 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 32 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 17:44:43,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1215280.0, ans=0.125 2024-08-11 17:44:46,483 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5600, loss[loss=0.1098, beats_loss=0.01027, ecapa_loss=0.0001849, whisper_loss=0.09767, over 20501.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01121, ecapa_loss=0.0001898, whisper_loss=0.09417, over 3878923.26 frames. ], batch size: 82, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:44:57,066 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 17:45:03,047 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-11 17:45:07,334 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.958e-01 2024-08-11 17:45:14,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1215480.0, ans=0.0 2024-08-11 17:45:51,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2024-08-11 17:46:02,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1215780.0, ans=0.0 2024-08-11 17:46:04,199 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 35 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 17:46:05,614 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5650, loss[loss=0.1367, beats_loss=0.009474, ecapa_loss=0.0002118, whisper_loss=0.1251, over 19011.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0113, ecapa_loss=0.0001919, whisper_loss=0.09341, over 3883118.69 frames. ], batch size: 75, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:46:27,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1215980.0, ans=0.0 2024-08-11 17:46:31,935 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.596e+01 3.008e+01 3.518e+01 5.757e+01, threshold=6.016e+01, percent-clipped=0.0 2024-08-11 17:46:38,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1216080.0, ans=0.1 2024-08-11 17:47:22,668 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5700, loss[loss=0.1308, beats_loss=0.009505, ecapa_loss=0.0002298, whisper_loss=0.119, over 21028.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01133, ecapa_loss=0.0001914, whisper_loss=0.09263, over 3894099.41 frames. ], batch size: 85, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:47:34,865 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.24 vs. limit=10.0 2024-08-11 17:47:40,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1216480.0, ans=0.1 2024-08-11 17:47:49,859 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 17:48:00,509 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 17:48:07,699 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 17:48:07,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1216580.0, ans=0.125 2024-08-11 17:48:11,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1216680.0, ans=10.0 2024-08-11 17:48:28,982 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:48:30,547 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 17:48:36,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1216780.0, ans=0.035 2024-08-11 17:48:42,716 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5750, loss[loss=0.1278, beats_loss=0.01176, ecapa_loss=0.0001948, whisper_loss=0.1141, over 22937.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01134, ecapa_loss=0.0001915, whisper_loss=0.09296, over 3882822.50 frames. ], batch size: 90, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:48:52,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-08-11 17:48:59,416 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.01 vs. limit=10.0 2024-08-11 17:49:02,006 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 17:49:02,706 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.77 vs. limit=22.5 2024-08-11 17:49:05,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1216980.0, ans=0.0 2024-08-11 17:49:08,950 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.576e+01 2.968e+01 3.290e+01 6.597e+01, threshold=5.936e+01, percent-clipped=1.0 2024-08-11 17:49:11,917 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.09 vs. limit=22.5 2024-08-11 17:49:15,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1217080.0, ans=0.125 2024-08-11 17:49:25,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1217080.0, ans=0.125 2024-08-11 17:49:28,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1217180.0, ans=0.1 2024-08-11 17:49:36,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1217180.0, ans=0.2 2024-08-11 17:49:51,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1217280.0, ans=0.0 2024-08-11 17:50:00,544 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5800, loss[loss=0.1122, beats_loss=0.00855, ecapa_loss=0.0002419, whisper_loss=0.1012, over 14005.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01131, ecapa_loss=0.0001916, whisper_loss=0.09282, over 3840921.99 frames. ], batch size: 57, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:50:16,708 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 17:50:17,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1217480.0, ans=0.0 2024-08-11 17:50:23,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1217480.0, ans=0.09899494936611666 2024-08-11 17:50:29,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1217580.0, ans=0.1 2024-08-11 17:50:38,377 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.97 vs. limit=10.0 2024-08-11 17:50:42,792 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2024-08-11 17:51:00,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1217780.0, ans=0.125 2024-08-11 17:51:15,029 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5850, loss[loss=0.11, beats_loss=0.01302, ecapa_loss=0.0001738, whisper_loss=0.09526, over 19611.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01122, ecapa_loss=0.0001927, whisper_loss=0.09298, over 3873059.18 frames. ], batch size: 80, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:51:21,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1217880.0, ans=0.04949747468305833 2024-08-11 17:51:31,394 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 17:51:39,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.535e+01 2.906e+01 3.221e+01 4.693e+01, threshold=5.811e+01, percent-clipped=0.0 2024-08-11 17:51:50,348 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-11 17:52:05,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1218180.0, ans=0.125 2024-08-11 17:52:07,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1218180.0, ans=0.0 2024-08-11 17:52:16,176 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 21 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 17:52:21,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1218280.0, ans=0.0 2024-08-11 17:52:29,309 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5900, loss[loss=0.1266, beats_loss=0.01013, ecapa_loss=0.0002057, whisper_loss=0.1144, over 18998.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01128, ecapa_loss=0.000192, whisper_loss=0.09172, over 3833927.19 frames. ], batch size: 73, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:52:29,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1218380.0, ans=0.2 2024-08-11 17:52:41,716 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 17:52:43,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1218480.0, ans=0.125 2024-08-11 17:52:45,227 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 17:53:20,333 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 17:53:41,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1218780.0, ans=0.125 2024-08-11 17:53:47,877 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 5950, loss[loss=0.09072, beats_loss=0.01427, ecapa_loss=0.0001901, whisper_loss=0.07454, over 20395.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01119, ecapa_loss=0.0001928, whisper_loss=0.09236, over 3811920.50 frames. ], batch size: 87, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:54:02,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1218980.0, ans=0.125 2024-08-11 17:54:13,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.543e+01 2.844e+01 3.292e+01 4.976e+01, threshold=5.688e+01, percent-clipped=0.0 2024-08-11 17:54:27,841 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 17:54:41,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1219180.0, ans=0.2 2024-08-11 17:54:44,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1219180.0, ans=0.125 2024-08-11 17:54:45,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1219180.0, ans=0.2 2024-08-11 17:55:00,478 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 17:55:03,127 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6000, loss[loss=0.1152, beats_loss=0.01254, ecapa_loss=0.0002723, whisper_loss=0.09998, over 22646.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01118, ecapa_loss=0.0001924, whisper_loss=0.09302, over 3863241.26 frames. ], batch size: 95, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:55:03,128 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 17:55:39,324 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on ASR_libri: loss=0.2573, beats_loss=0, ecapa_loss=0.0006361, whisper_loss=0.2509, over 922467.00 frames. 2024-08-11 17:55:57,553 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on SV_voxceleb1: loss=0.005086, beats_loss=0, ecapa_loss=0.0005086, whisper_loss=0, over 939242.00 frames. 2024-08-11 17:57:42,058 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on AT_audioset: loss=0.02513, beats_loss=0.02513, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 17:57:42,062 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 17:58:12,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1219480.0, ans=0.0 2024-08-11 17:58:34,185 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 17:58:42,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1219680.0, ans=0.1 2024-08-11 17:59:06,895 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6050, loss[loss=0.11, beats_loss=0.01292, ecapa_loss=0.0002034, whisper_loss=0.095, over 22374.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01126, ecapa_loss=0.0001914, whisper_loss=0.09292, over 3890195.26 frames. ], batch size: 92, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:59:26,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1219980.0, ans=0.125 2024-08-11 17:59:27,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=1219980.0, ans=0.02 2024-08-11 17:59:34,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.570e+01 2.877e+01 3.382e+01 4.916e+01, threshold=5.754e+01, percent-clipped=0.0 2024-08-11 17:59:51,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1220080.0, ans=0.125 2024-08-11 18:00:17,715 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-11 18:00:29,159 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6100, loss[loss=0.1176, beats_loss=0.009134, ecapa_loss=0.0002142, whisper_loss=0.1063, over 19023.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01128, ecapa_loss=0.0001931, whisper_loss=0.09261, over 3908799.77 frames. ], batch size: 76, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:00:50,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1220480.0, ans=0.95 2024-08-11 18:01:06,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1220580.0, ans=0.125 2024-08-11 18:01:20,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-11 18:01:32,866 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-11 18:01:52,807 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6150, loss[loss=0.1108, beats_loss=0.01237, ecapa_loss=0.0001558, whisper_loss=0.09685, over 18975.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01126, ecapa_loss=0.000193, whisper_loss=0.09324, over 3905412.12 frames. ], batch size: 74, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:01:53,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1220880.0, ans=0.0 2024-08-11 18:01:54,694 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-11 18:01:59,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1220880.0, ans=0.2 2024-08-11 18:02:01,996 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 16 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 18:02:07,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1220980.0, ans=0.0 2024-08-11 18:02:13,304 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 18:02:20,166 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 2.665e+01 2.922e+01 3.415e+01 6.689e+01, threshold=5.844e+01, percent-clipped=1.0 2024-08-11 18:02:33,184 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-11 18:02:36,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1221080.0, ans=0.125 2024-08-11 18:02:41,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1221180.0, ans=0.2 2024-08-11 18:02:59,395 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 18:03:02,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1221280.0, ans=0.09899494936611666 2024-08-11 18:03:11,546 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6200, loss[loss=0.1189, beats_loss=0.009085, ecapa_loss=0.0002171, whisper_loss=0.1076, over 20482.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01124, ecapa_loss=0.0001918, whisper_loss=0.09323, over 3888570.62 frames. ], batch size: 82, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:03:15,149 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 18:03:49,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1221580.0, ans=0.2 2024-08-11 18:03:50,926 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2024-08-11 18:03:51,480 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 18:03:53,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1221580.0, ans=0.0 2024-08-11 18:03:54,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1221580.0, ans=0.125 2024-08-11 18:03:59,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1221680.0, ans=0.1 2024-08-11 18:04:11,584 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 18:04:14,086 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.09 vs. limit=15.0 2024-08-11 18:04:31,906 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6250, loss[loss=0.08857, beats_loss=0.01218, ecapa_loss=0.0002221, whisper_loss=0.07417, over 16718.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01133, ecapa_loss=0.0001914, whisper_loss=0.09231, over 3864593.79 frames. ], batch size: 72, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:04:35,529 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-11 18:04:35,760 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:04:57,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1221980.0, ans=0.125 2024-08-11 18:04:58,897 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+01 2.590e+01 2.864e+01 3.315e+01 6.460e+01, threshold=5.728e+01, percent-clipped=1.0 2024-08-11 18:05:01,232 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.26 vs. limit=22.5 2024-08-11 18:05:05,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1222080.0, ans=0.0 2024-08-11 18:05:26,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1222180.0, ans=0.0 2024-08-11 18:05:52,336 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6300, loss[loss=0.09669, beats_loss=0.01302, ecapa_loss=0.0001968, whisper_loss=0.0817, over 20391.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01127, ecapa_loss=0.0001926, whisper_loss=0.09282, over 3859804.58 frames. ], batch size: 86, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:05:52,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1222380.0, ans=0.025 2024-08-11 18:07:20,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1222680.0, ans=0.2 2024-08-11 18:07:20,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1222680.0, ans=0.0 2024-08-11 18:07:21,505 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 18:07:28,349 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.18 vs. limit=5.0 2024-08-11 18:07:30,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1222780.0, ans=0.0 2024-08-11 18:07:46,294 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6350, loss[loss=0.1063, beats_loss=0.01174, ecapa_loss=0.0001958, whisper_loss=0.09261, over 22329.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01132, ecapa_loss=0.0001929, whisper_loss=0.09334, over 3885669.37 frames. ], batch size: 91, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:08:17,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.576e+01 2.972e+01 3.431e+01 4.977e+01, threshold=5.945e+01, percent-clipped=0.0 2024-08-11 18:08:24,839 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 18:08:25,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1223080.0, ans=0.125 2024-08-11 18:08:31,806 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-11 18:08:57,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1223180.0, ans=0.0 2024-08-11 18:09:02,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1223180.0, ans=0.0 2024-08-11 18:09:32,239 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6400, loss[loss=0.121, beats_loss=0.01211, ecapa_loss=0.0001556, whisper_loss=0.1073, over 20593.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01135, ecapa_loss=0.000192, whisper_loss=0.09301, over 3907132.68 frames. ], batch size: 79, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:09:32,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1223380.0, ans=0.0 2024-08-11 18:09:39,520 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=12.0 2024-08-11 18:09:58,576 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 18:10:03,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1223480.0, ans=0.125 2024-08-11 18:10:30,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1223580.0, ans=0.0 2024-08-11 18:10:30,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1223580.0, ans=0.1 2024-08-11 18:10:34,901 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 18:11:23,723 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6450, loss[loss=0.1068, beats_loss=0.01231, ecapa_loss=0.0001671, whisper_loss=0.09283, over 19020.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01138, ecapa_loss=0.000193, whisper_loss=0.093, over 3928617.45 frames. ], batch size: 75, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:11:24,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1223880.0, ans=0.0 2024-08-11 18:11:55,280 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-11 18:11:55,355 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2024-08-11 18:12:05,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1223980.0, ans=0.125 2024-08-11 18:12:08,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.662e+01 3.047e+01 3.508e+01 5.395e+01, threshold=6.093e+01, percent-clipped=0.0 2024-08-11 18:12:08,298 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 18:12:10,150 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:12:13,538 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 41 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 18:12:24,915 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 11 from Vox, 42 fro AS 2024-08-11 18:12:32,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1224080.0, ans=0.0 2024-08-11 18:13:02,049 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:13:26,595 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6500, loss[loss=0.1286, beats_loss=0.01043, ecapa_loss=0.0001847, whisper_loss=0.1163, over 16530.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0113, ecapa_loss=0.0001928, whisper_loss=0.09435, over 3959505.56 frames. ], batch size: 62, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:13:28,916 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-08-11 18:13:39,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1224380.0, ans=0.09899494936611666 2024-08-11 18:13:54,411 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-11 18:14:16,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2024-08-11 18:14:21,300 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.781e+05 2024-08-11 18:14:37,029 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-11 18:14:37,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1224680.0, ans=0.125 2024-08-11 18:14:41,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1224680.0, ans=0.125 2024-08-11 18:14:44,236 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 18:14:52,462 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-11 18:15:24,741 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6550, loss[loss=0.105, beats_loss=0.01086, ecapa_loss=0.0001542, whisper_loss=0.09259, over 20364.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01131, ecapa_loss=0.0001933, whisper_loss=0.09438, over 3972399.73 frames. ], batch size: 77, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:15:31,758 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 18:16:02,663 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 18:16:06,717 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+01 2.812e+01 3.232e+01 4.010e+01 5.660e+01, threshold=6.463e+01, percent-clipped=0.0 2024-08-11 18:16:18,838 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 18:16:31,701 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=15.0 2024-08-11 18:17:05,028 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6600, loss[loss=0.1038, beats_loss=0.01216, ecapa_loss=0.0001587, whisper_loss=0.09009, over 14389.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01117, ecapa_loss=0.0001932, whisper_loss=0.09475, over 3965737.80 frames. ], batch size: 55, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:17:08,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1225380.0, ans=0.05 2024-08-11 18:17:31,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1225480.0, ans=0.0 2024-08-11 18:18:25,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1225780.0, ans=0.0 2024-08-11 18:18:25,330 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-11 18:18:28,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1225780.0, ans=0.125 2024-08-11 18:18:33,890 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6650, loss[loss=0.1028, beats_loss=0.01152, ecapa_loss=0.0002144, whisper_loss=0.08912, over 20143.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01122, ecapa_loss=0.0001928, whisper_loss=0.09413, over 3984385.03 frames. ], batch size: 83, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:18:38,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1225880.0, ans=0.125 2024-08-11 18:18:42,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.27 vs. limit=15.0 2024-08-11 18:19:02,901 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.730e+01 3.226e+01 3.856e+01 7.096e+01, threshold=6.452e+01, percent-clipped=1.0 2024-08-11 18:19:12,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1226080.0, ans=0.125 2024-08-11 18:19:46,322 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.12 vs. limit=22.5 2024-08-11 18:19:50,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1226280.0, ans=0.1 2024-08-11 18:19:56,287 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 18:20:00,214 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6700, loss[loss=0.09933, beats_loss=0.01063, ecapa_loss=0.0001916, whisper_loss=0.08678, over 15167.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01121, ecapa_loss=0.000193, whisper_loss=0.09422, over 3959040.17 frames. ], batch size: 58, lr: 7.04e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:20:05,988 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.594e-01 2024-08-11 18:20:23,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1226480.0, ans=0.125 2024-08-11 18:20:33,602 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-11 18:20:44,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1226580.0, ans=0.0 2024-08-11 18:20:46,176 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-11 18:21:06,589 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.36 vs. limit=22.5 2024-08-11 18:21:18,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1226780.0, ans=0.1 2024-08-11 18:21:21,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1226780.0, ans=0.125 2024-08-11 18:21:24,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1226880.0, ans=0.125 2024-08-11 18:21:25,153 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6750, loss[loss=0.08814, beats_loss=0.01379, ecapa_loss=0.0001703, whisper_loss=0.07265, over 18536.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01125, ecapa_loss=0.0001936, whisper_loss=0.09332, over 3902139.92 frames. ], batch size: 73, lr: 7.04e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:21:26,825 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 18:21:57,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.725e+01 3.041e+01 3.593e+01 5.305e+01, threshold=6.083e+01, percent-clipped=0.0 2024-08-11 18:22:18,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1227180.0, ans=0.0 2024-08-11 18:22:30,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1227180.0, ans=0.125 2024-08-11 18:22:30,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1227180.0, ans=0.0 2024-08-11 18:22:41,471 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.610e+02 2024-08-11 18:22:43,547 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-11 18:22:51,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1227380.0, ans=0.125 2024-08-11 18:22:52,592 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6800, loss[loss=0.09802, beats_loss=0.01252, ecapa_loss=0.0001743, whisper_loss=0.08376, over 21114.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01115, ecapa_loss=0.0001947, whisper_loss=0.09328, over 3896665.30 frames. ], batch size: 86, lr: 7.04e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:22:56,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1227380.0, ans=0.0 2024-08-11 18:22:56,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1227380.0, ans=0.125 2024-08-11 18:22:56,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1227380.0, ans=0.0 2024-08-11 18:23:07,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1227380.0, ans=0.0 2024-08-11 18:23:32,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1227580.0, ans=0.125 2024-08-11 18:23:42,746 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 17 from LS+wenet, 30 from Vox, 45 fro AS 2024-08-11 18:24:08,336 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 18:24:10,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1227780.0, ans=0.0 2024-08-11 18:24:20,017 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6850, loss[loss=0.09745, beats_loss=0.01178, ecapa_loss=0.0001647, whisper_loss=0.08403, over 16533.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01134, ecapa_loss=0.0001937, whisper_loss=0.09177, over 3879514.82 frames. ], batch size: 64, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:24:34,127 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 18:24:40,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1227980.0, ans=0.2 2024-08-11 18:24:43,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1227980.0, ans=0.125 2024-08-11 18:24:45,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1227980.0, ans=0.125 2024-08-11 18:24:52,849 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.557e+01 2.801e+01 3.138e+01 4.430e+01, threshold=5.603e+01, percent-clipped=0.0 2024-08-11 18:24:53,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1227980.0, ans=0.125 2024-08-11 18:25:16,584 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2024-08-11 18:25:33,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1228280.0, ans=0.1 2024-08-11 18:25:46,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1228280.0, ans=0.0 2024-08-11 18:25:52,438 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6900, loss[loss=0.1121, beats_loss=0.009843, ecapa_loss=0.0001676, whisper_loss=0.1005, over 20558.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01129, ecapa_loss=0.0001955, whisper_loss=0.09161, over 3884561.63 frames. ], batch size: 77, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:26:06,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1228380.0, ans=0.1 2024-08-11 18:26:10,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1228480.0, ans=0.0 2024-08-11 18:26:10,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.37 vs. limit=10.0 2024-08-11 18:26:28,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1228580.0, ans=0.125 2024-08-11 18:26:32,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1228580.0, ans=0.125 2024-08-11 18:26:38,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1228580.0, ans=0.125 2024-08-11 18:26:56,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1228680.0, ans=0.0 2024-08-11 18:27:01,762 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 17 from Vox, 52 fro AS 2024-08-11 18:27:23,258 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 6950, loss[loss=0.126, beats_loss=0.009528, ecapa_loss=0.0002321, whisper_loss=0.1141, over 21734.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01133, ecapa_loss=0.0001932, whisper_loss=0.09232, over 3921904.06 frames. ], batch size: 86, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:27:54,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1228980.0, ans=0.125 2024-08-11 18:27:56,160 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.618e+01 3.004e+01 3.400e+01 5.942e+01, threshold=6.008e+01, percent-clipped=1.0 2024-08-11 18:28:03,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1229080.0, ans=0.5 2024-08-11 18:28:15,157 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 18:28:26,653 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 18:28:42,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1229280.0, ans=0.125 2024-08-11 18:28:52,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1229380.0, ans=0.125 2024-08-11 18:28:54,282 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7000, loss[loss=0.1111, beats_loss=0.01152, ecapa_loss=0.0002352, whisper_loss=0.09722, over 21319.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01128, ecapa_loss=0.0001932, whisper_loss=0.09244, over 3889449.40 frames. ], batch size: 93, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:29:02,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1229380.0, ans=0.2 2024-08-11 18:29:15,732 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.06 vs. limit=22.5 2024-08-11 18:29:24,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1229480.0, ans=0.1 2024-08-11 18:30:11,943 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 18:30:23,700 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7050, loss[loss=0.1161, beats_loss=0.01117, ecapa_loss=0.0002183, whisper_loss=0.1027, over 21429.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01134, ecapa_loss=0.0001919, whisper_loss=0.09166, over 3882338.35 frames. ], batch size: 88, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:30:38,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1229980.0, ans=0.125 2024-08-11 18:30:45,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1229980.0, ans=0.09899494936611666 2024-08-11 18:30:47,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1229980.0, ans=0.125 2024-08-11 18:30:54,779 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.713e+01 3.050e+01 3.555e+01 6.661e+01, threshold=6.100e+01, percent-clipped=2.0 2024-08-11 18:31:19,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1230180.0, ans=0.0 2024-08-11 18:31:28,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2024-08-11 18:31:30,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1230180.0, ans=0.125 2024-08-11 18:31:34,191 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 18:31:48,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1230280.0, ans=0.0 2024-08-11 18:31:49,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1230280.0, ans=0.0 2024-08-11 18:31:51,134 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 18:31:52,609 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7100, loss[loss=0.09397, beats_loss=0.009998, ecapa_loss=0.0001535, whisper_loss=0.08243, over 14435.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01138, ecapa_loss=0.0001901, whisper_loss=0.09161, over 3870566.02 frames. ], batch size: 54, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:32:02,343 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 18:32:09,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1230480.0, ans=0.125 2024-08-11 18:32:33,700 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-11 18:32:36,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1230580.0, ans=0.025 2024-08-11 18:32:36,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1230580.0, ans=0.125 2024-08-11 18:32:38,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1230580.0, ans=0.125 2024-08-11 18:32:43,500 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 18:32:47,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-11 18:32:51,319 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 18:32:58,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1230680.0, ans=0.0 2024-08-11 18:33:00,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1230680.0, ans=0.125 2024-08-11 18:33:21,484 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7150, loss[loss=0.1165, beats_loss=0.01128, ecapa_loss=0.0002191, whisper_loss=0.1031, over 17684.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01135, ecapa_loss=0.0001924, whisper_loss=0.0916, over 3858904.60 frames. ], batch size: 75, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:33:23,851 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 18:33:34,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1230880.0, ans=0.125 2024-08-11 18:33:49,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1230980.0, ans=0.0 2024-08-11 18:33:50,880 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 18:33:54,879 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.688e+01 3.029e+01 3.368e+01 5.006e+01, threshold=6.058e+01, percent-clipped=0.0 2024-08-11 18:34:01,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1231080.0, ans=0.125 2024-08-11 18:34:15,447 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.40 vs. limit=22.5 2024-08-11 18:34:32,926 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 18:34:34,711 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-11 18:34:36,117 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 18:34:49,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1231280.0, ans=0.125 2024-08-11 18:34:50,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1231280.0, ans=0.09899494936611666 2024-08-11 18:34:55,367 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7200, loss[loss=0.08846, beats_loss=0.01247, ecapa_loss=0.0002646, whisper_loss=0.07335, over 18101.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01135, ecapa_loss=0.0001925, whisper_loss=0.09103, over 3827808.74 frames. ], batch size: 77, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:34:59,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.48 vs. limit=22.5 2024-08-11 18:35:00,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1231380.0, ans=0.1 2024-08-11 18:35:01,766 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 18:35:12,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1231480.0, ans=0.125 2024-08-11 18:35:14,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1231480.0, ans=0.125 2024-08-11 18:35:57,324 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 18:35:59,051 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 18:36:21,817 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7250, loss[loss=0.1223, beats_loss=0.009228, ecapa_loss=0.0001529, whisper_loss=0.1116, over 15333.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01128, ecapa_loss=0.0001914, whisper_loss=0.09201, over 3839194.25 frames. ], batch size: 55, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:36:27,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1231880.0, ans=0.0 2024-08-11 18:36:33,595 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 30 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 18:36:51,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.618e+01 2.954e+01 3.399e+01 5.489e+01, threshold=5.908e+01, percent-clipped=0.0 2024-08-11 18:37:14,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1232180.0, ans=0.0 2024-08-11 18:37:35,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1232280.0, ans=0.0 2024-08-11 18:37:45,501 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7300, loss[loss=0.09422, beats_loss=0.01238, ecapa_loss=0.0001407, whisper_loss=0.08044, over 17245.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01122, ecapa_loss=0.0001922, whisper_loss=0.09277, over 3840120.49 frames. ], batch size: 67, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:37:50,857 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-11 18:37:56,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1232380.0, ans=0.0 2024-08-11 18:38:01,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1232480.0, ans=0.125 2024-08-11 18:38:59,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1232780.0, ans=0.125 2024-08-11 18:39:02,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1232780.0, ans=0.025 2024-08-11 18:39:09,645 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7350, loss[loss=0.1137, beats_loss=0.01243, ecapa_loss=0.0001735, whisper_loss=0.09958, over 17238.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0112, ecapa_loss=0.0001919, whisper_loss=0.0926, over 3837228.14 frames. ], batch size: 67, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:39:17,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1232880.0, ans=0.0 2024-08-11 18:39:21,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1232880.0, ans=0.125 2024-08-11 18:39:39,039 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.552e+01 3.033e+01 3.374e+01 5.510e+01, threshold=6.067e+01, percent-clipped=0.0 2024-08-11 18:39:47,825 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 18:39:49,620 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 18:39:55,302 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 34 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 18:40:02,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1233180.0, ans=0.1 2024-08-11 18:40:16,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1233280.0, ans=15.0 2024-08-11 18:40:32,607 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7400, loss[loss=0.1093, beats_loss=0.01024, ecapa_loss=0.0001875, whisper_loss=0.0972, over 23705.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01123, ecapa_loss=0.0001923, whisper_loss=0.09258, over 3875535.34 frames. ], batch size: 93, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:40:46,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1233380.0, ans=0.125 2024-08-11 18:40:56,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1233480.0, ans=0.125 2024-08-11 18:40:59,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1233480.0, ans=0.125 2024-08-11 18:41:24,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1233680.0, ans=0.1 2024-08-11 18:41:27,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.55 vs. limit=15.0 2024-08-11 18:41:50,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1233780.0, ans=0.2 2024-08-11 18:41:55,582 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7450, loss[loss=0.1, beats_loss=0.0127, ecapa_loss=0.0001713, whisper_loss=0.08558, over 14351.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01118, ecapa_loss=0.0001935, whisper_loss=0.09295, over 3863331.56 frames. ], batch size: 57, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:42:07,895 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 18:42:12,966 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 18:42:15,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1233980.0, ans=0.09899494936611666 2024-08-11 18:42:16,492 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 18:42:24,410 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 18:42:28,076 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+01 2.705e+01 3.012e+01 3.463e+01 6.106e+01, threshold=6.024e+01, percent-clipped=1.0 2024-08-11 18:42:32,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1234080.0, ans=0.125 2024-08-11 18:42:38,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=1234080.0, ans=10.0 2024-08-11 18:42:49,948 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 18:42:53,627 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 18:43:03,041 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-11 18:43:21,382 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 18:43:22,654 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7500, loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0002005, whisper_loss=0.09015, over 18452.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01117, ecapa_loss=0.0001934, whisper_loss=0.0932, over 3891663.78 frames. ], batch size: 75, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:43:29,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1234380.0, ans=0.125 2024-08-11 18:43:33,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1234380.0, ans=0.035 2024-08-11 18:43:46,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1234480.0, ans=0.0 2024-08-11 18:43:50,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1234480.0, ans=0.125 2024-08-11 18:44:04,643 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.625e-03 2024-08-11 18:44:10,523 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2024-08-11 18:44:25,434 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.249e+00 2024-08-11 18:44:27,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1234780.0, ans=0.125 2024-08-11 18:44:30,121 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 18:44:32,195 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.96 vs. limit=15.0 2024-08-11 18:44:44,416 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7550, loss[loss=0.09809, beats_loss=0.008646, ecapa_loss=0.0001906, whisper_loss=0.08754, over 19714.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0111, ecapa_loss=0.0001954, whisper_loss=0.09299, over 3852171.59 frames. ], batch size: 78, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:44:50,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1234880.0, ans=10.0 2024-08-11 18:44:59,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1234980.0, ans=0.0 2024-08-11 18:45:12,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.591e+01 2.941e+01 3.490e+01 1.489e+02, threshold=5.883e+01, percent-clipped=2.0 2024-08-11 18:45:32,283 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-11 18:45:32,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1235180.0, ans=0.1 2024-08-11 18:45:45,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1235180.0, ans=0.125 2024-08-11 18:45:48,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1235280.0, ans=0.125 2024-08-11 18:45:49,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1235280.0, ans=0.0 2024-08-11 18:45:59,872 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 18:46:05,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1235380.0, ans=0.125 2024-08-11 18:46:07,143 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7600, loss[loss=0.08746, beats_loss=0.01193, ecapa_loss=0.0001675, whisper_loss=0.07385, over 16101.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01113, ecapa_loss=0.0001957, whisper_loss=0.0926, over 3866646.49 frames. ], batch size: 64, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:46:19,443 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 18:46:25,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1235480.0, ans=0.125 2024-08-11 18:46:55,826 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 20 from LS+wenet, 23 from Vox, 52 fro AS 2024-08-11 18:47:05,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1235680.0, ans=0.0 2024-08-11 18:47:05,816 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-08-11 18:47:07,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1235680.0, ans=0.0 2024-08-11 18:47:12,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1235680.0, ans=0.125 2024-08-11 18:47:13,947 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 18:47:26,723 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2024-08-11 18:47:32,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1235880.0, ans=0.125 2024-08-11 18:47:34,262 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7650, loss[loss=0.1311, beats_loss=0.009337, ecapa_loss=0.0002089, whisper_loss=0.1197, over 17640.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01108, ecapa_loss=0.0001948, whisper_loss=0.0933, over 3842030.88 frames. ], batch size: 70, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:47:36,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1235880.0, ans=0.0 2024-08-11 18:48:04,433 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.623e+01 3.033e+01 3.717e+01 6.248e+01, threshold=6.065e+01, percent-clipped=1.0 2024-08-11 18:48:09,199 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2024-08-11 18:48:10,596 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.03 vs. limit=22.5 2024-08-11 18:48:22,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1236080.0, ans=0.125 2024-08-11 18:48:25,757 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-11 18:48:27,805 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 18:48:31,054 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 18:49:00,859 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7700, loss[loss=0.1137, beats_loss=0.01049, ecapa_loss=0.0001667, whisper_loss=0.1016, over 23063.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01111, ecapa_loss=0.0001946, whisper_loss=0.09298, over 3889392.37 frames. ], batch size: 89, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:49:09,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1236380.0, ans=0.2 2024-08-11 18:49:21,922 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 18:49:25,724 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:50:22,653 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7750, loss[loss=0.1122, beats_loss=0.01056, ecapa_loss=0.0001847, whisper_loss=0.09979, over 14525.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01115, ecapa_loss=0.000193, whisper_loss=0.09262, over 3894717.84 frames. ], batch size: 54, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:50:24,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1236880.0, ans=0.125 2024-08-11 18:50:40,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1236980.0, ans=0.1 2024-08-11 18:50:51,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1236980.0, ans=0.1 2024-08-11 18:50:52,343 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.693e+01 2.903e+01 3.373e+01 1.168e+02, threshold=5.806e+01, percent-clipped=1.0 2024-08-11 18:51:07,772 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 18:51:10,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1237180.0, ans=0.125 2024-08-11 18:51:22,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1237180.0, ans=0.1 2024-08-11 18:51:27,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1237280.0, ans=0.125 2024-08-11 18:51:37,350 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-11 18:51:37,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1237280.0, ans=0.125 2024-08-11 18:51:41,589 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7800, loss[loss=0.1129, beats_loss=0.01292, ecapa_loss=0.0001831, whisper_loss=0.09811, over 21718.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01114, ecapa_loss=0.0001909, whisper_loss=0.09311, over 3905028.06 frames. ], batch size: 89, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:51:54,678 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 35 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-11 18:52:19,218 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=12.0 2024-08-11 18:52:37,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1237680.0, ans=0.2 2024-08-11 18:52:48,233 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 18:52:53,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1237780.0, ans=0.125 2024-08-11 18:52:53,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1237780.0, ans=0.125 2024-08-11 18:52:57,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7850, loss[loss=0.08952, beats_loss=0.01106, ecapa_loss=0.0001893, whisper_loss=0.07658, over 13418.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01116, ecapa_loss=0.0001921, whisper_loss=0.09321, over 3889545.88 frames. ], batch size: 55, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:52:57,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1237880.0, ans=0.0 2024-08-11 18:53:02,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1237880.0, ans=0.1 2024-08-11 18:53:13,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1237980.0, ans=0.0 2024-08-11 18:53:21,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1237980.0, ans=0.125 2024-08-11 18:53:24,524 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.579e+01 2.865e+01 3.320e+01 8.816e+01, threshold=5.729e+01, percent-clipped=1.0 2024-08-11 18:53:26,057 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 18:53:36,576 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:53:43,919 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 18:53:46,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1238180.0, ans=0.125 2024-08-11 18:53:49,927 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.54 vs. limit=12.0 2024-08-11 18:53:53,909 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 18:54:09,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1238280.0, ans=0.125 2024-08-11 18:54:13,065 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7900, loss[loss=0.09288, beats_loss=0.01093, ecapa_loss=0.0002038, whisper_loss=0.07991, over 20964.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01117, ecapa_loss=0.0001917, whisper_loss=0.09327, over 3888092.64 frames. ], batch size: 87, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:54:28,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1238480.0, ans=0.125 2024-08-11 18:54:36,893 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 26 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-11 18:54:39,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1238480.0, ans=0.125 2024-08-11 18:54:44,402 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 40 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 18:54:49,775 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.82 vs. limit=22.5 2024-08-11 18:54:52,754 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 18:55:01,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1238680.0, ans=0.125 2024-08-11 18:55:15,367 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2024-08-11 18:55:17,641 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-11 18:55:27,237 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 7950, loss[loss=0.0989, beats_loss=0.01226, ecapa_loss=0.0001903, whisper_loss=0.08474, over 21906.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01117, ecapa_loss=0.0001915, whisper_loss=0.09375, over 3909721.19 frames. ], batch size: 89, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:55:37,716 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2024-08-11 18:55:47,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1238980.0, ans=0.0 2024-08-11 18:55:52,646 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.748e+01 3.056e+01 3.459e+01 5.765e+01, threshold=6.112e+01, percent-clipped=1.0 2024-08-11 18:56:02,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1239080.0, ans=0.125 2024-08-11 18:56:15,177 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-11 18:56:17,543 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 18:56:31,810 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 18:56:33,809 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-11 18:56:37,558 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8000, loss[loss=0.1207, beats_loss=0.009401, ecapa_loss=0.0002042, whisper_loss=0.1093, over 16979.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01109, ecapa_loss=0.0001921, whisper_loss=0.09402, over 3906701.04 frames. ], batch size: 66, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:56:41,589 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-08-11 18:56:41,721 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=15.0 2024-08-11 18:56:55,137 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 18:57:17,893 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 18:57:19,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1239680.0, ans=0.125 2024-08-11 18:57:21,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1239680.0, ans=0.125 2024-08-11 18:57:28,554 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.00 vs. limit=22.5 2024-08-11 18:57:32,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1239680.0, ans=0.125 2024-08-11 18:57:39,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1239780.0, ans=0.0 2024-08-11 18:57:46,999 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 12 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 18:57:48,186 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8050, loss[loss=0.08037, beats_loss=0.01097, ecapa_loss=0.0001644, whisper_loss=0.06776, over 14722.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01118, ecapa_loss=0.0001927, whisper_loss=0.09324, over 3860659.13 frames. ], batch size: 58, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:57:50,329 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.55 vs. limit=8.0 2024-08-11 18:58:10,688 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 18:58:12,548 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-11 18:58:14,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.885e+01 3.265e+01 3.759e+01 1.907e+02, threshold=6.530e+01, percent-clipped=2.0 2024-08-11 18:58:19,611 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 27 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 18:58:38,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1240180.0, ans=0.1 2024-08-11 18:58:42,657 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.57 vs. limit=10.0 2024-08-11 18:58:45,745 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 18:58:53,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1240280.0, ans=0.0 2024-08-11 18:58:56,002 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8100, loss[loss=0.09487, beats_loss=0.01069, ecapa_loss=0.0002234, whisper_loss=0.08195, over 20076.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01118, ecapa_loss=0.0001912, whisper_loss=0.09298, over 3871258.63 frames. ], batch size: 81, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:59:04,209 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 18:59:04,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1240380.0, ans=0.0 2024-08-11 18:59:04,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1240380.0, ans=0.07 2024-08-11 18:59:05,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1240380.0, ans=0.125 2024-08-11 18:59:13,795 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 18:59:19,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1240480.0, ans=0.0 2024-08-11 18:59:32,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1240580.0, ans=0.0 2024-08-11 18:59:36,894 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.29 vs. limit=15.0 2024-08-11 18:59:48,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1240780.0, ans=0.0 2024-08-11 18:59:57,434 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 18:59:58,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1240780.0, ans=0.2 2024-08-11 19:00:00,645 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-11 19:00:02,584 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8150, loss[loss=0.1034, beats_loss=0.01262, ecapa_loss=0.0001615, whisper_loss=0.08912, over 17571.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01123, ecapa_loss=0.0001911, whisper_loss=0.09254, over 3850252.62 frames. ], batch size: 67, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:00:04,788 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.16 vs. limit=10.0 2024-08-11 19:00:23,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1240980.0, ans=0.2 2024-08-11 19:00:26,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.542e+01 2.871e+01 3.241e+01 4.432e+01, threshold=5.742e+01, percent-clipped=0.0 2024-08-11 19:00:32,128 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-11 19:00:36,819 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.15 vs. limit=10.0 2024-08-11 19:01:02,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1241280.0, ans=0.1 2024-08-11 19:01:05,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1241280.0, ans=0.1 2024-08-11 19:01:08,786 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8200, loss[loss=0.1132, beats_loss=0.0124, ecapa_loss=0.0001801, whisper_loss=0.09897, over 13935.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01123, ecapa_loss=0.0001927, whisper_loss=0.09264, over 3843479.07 frames. ], batch size: 53, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:01:17,168 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-11 19:01:24,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1241480.0, ans=0.125 2024-08-11 19:01:32,039 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 19:01:37,142 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 19:01:41,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1241580.0, ans=0.0 2024-08-11 19:01:45,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1241580.0, ans=10.0 2024-08-11 19:01:47,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1241680.0, ans=0.015 2024-08-11 19:01:53,699 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.877e-02 2024-08-11 19:02:14,345 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8250, loss[loss=0.1215, beats_loss=0.01099, ecapa_loss=0.0001932, whisper_loss=0.1086, over 17225.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01129, ecapa_loss=0.0001922, whisper_loss=0.09224, over 3853733.90 frames. ], batch size: 69, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:02:17,712 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-11 19:02:24,834 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 19:02:32,996 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 19:02:37,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.577e+01 2.823e+01 3.231e+01 7.611e+01, threshold=5.645e+01, percent-clipped=2.0 2024-08-11 19:02:39,319 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 19:02:41,700 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 27 from Vox, 17 fro AS 2024-08-11 19:03:01,404 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-11 19:03:11,159 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 19:03:16,352 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 19:03:19,943 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8300, loss[loss=0.131, beats_loss=0.009923, ecapa_loss=0.0002055, whisper_loss=0.119, over 22915.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01125, ecapa_loss=0.0001914, whisper_loss=0.09324, over 3892868.22 frames. ], batch size: 93, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:03:29,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1242380.0, ans=10.0 2024-08-11 19:03:30,919 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 13 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 19:03:39,948 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 19:03:43,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1242480.0, ans=0.1 2024-08-11 19:03:49,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1242580.0, ans=0.125 2024-08-11 19:03:57,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1242580.0, ans=0.0 2024-08-11 19:04:08,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1242680.0, ans=0.125 2024-08-11 19:04:15,644 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.220e-01 2024-08-11 19:04:16,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1242780.0, ans=0.2 2024-08-11 19:04:25,405 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8350, loss[loss=0.1207, beats_loss=0.01084, ecapa_loss=0.0001945, whisper_loss=0.1079, over 21466.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01125, ecapa_loss=0.0001926, whisper_loss=0.09275, over 3888257.55 frames. ], batch size: 86, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:04:30,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1242880.0, ans=0.0 2024-08-11 19:04:49,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.806e+01 3.050e+01 3.549e+01 1.399e+02, threshold=6.100e+01, percent-clipped=1.0 2024-08-11 19:05:00,117 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2024-08-11 19:05:30,970 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8400, loss[loss=0.07957, beats_loss=0.01144, ecapa_loss=0.0002297, whisper_loss=0.06583, over 13965.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01125, ecapa_loss=0.0001923, whisper_loss=0.09217, over 3887076.37 frames. ], batch size: 59, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:05:34,680 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 19:06:00,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1243580.0, ans=0.125 2024-08-11 19:06:19,781 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 19:06:36,993 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8450, loss[loss=0.1143, beats_loss=0.01066, ecapa_loss=0.0002033, whisper_loss=0.1016, over 22523.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0112, ecapa_loss=0.0001924, whisper_loss=0.09251, over 3879486.56 frames. ], batch size: 91, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:06:38,443 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-11 19:06:42,146 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 19:06:50,944 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=15.0 2024-08-11 19:06:50,949 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.65 vs. limit=15.0 2024-08-11 19:06:52,374 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2024-08-11 19:06:55,609 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 19 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-11 19:07:00,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.505e+01 2.848e+01 3.231e+01 4.188e+01, threshold=5.696e+01, percent-clipped=0.0 2024-08-11 19:07:00,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1243980.0, ans=0.125 2024-08-11 19:07:03,479 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 19:07:07,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1244080.0, ans=0.1 2024-08-11 19:07:10,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1244080.0, ans=0.125 2024-08-11 19:07:42,633 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8500, loss[loss=0.09343, beats_loss=0.01047, ecapa_loss=0.0001771, whisper_loss=0.08119, over 18577.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01133, ecapa_loss=0.0001912, whisper_loss=0.09166, over 3872373.03 frames. ], batch size: 71, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:07:58,796 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 19:08:03,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1244480.0, ans=0.125 2024-08-11 19:08:04,541 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.86 vs. limit=22.5 2024-08-11 19:08:33,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1244680.0, ans=0.125 2024-08-11 19:08:44,606 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-11 19:08:45,218 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-11 19:08:49,041 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8550, loss[loss=0.09816, beats_loss=0.01237, ecapa_loss=0.0001801, whisper_loss=0.08399, over 19331.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01137, ecapa_loss=0.0001906, whisper_loss=0.09105, over 3840817.69 frames. ], batch size: 80, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:08:57,175 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 42 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-11 19:09:04,719 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.58 vs. limit=22.5 2024-08-11 19:09:05,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1244980.0, ans=0.125 2024-08-11 19:09:13,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.056e+01 2.649e+01 3.008e+01 3.594e+01 2.630e+02, threshold=6.016e+01, percent-clipped=2.0 2024-08-11 19:09:21,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.46 vs. limit=15.0 2024-08-11 19:09:32,892 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 19:09:34,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1245180.0, ans=0.0 2024-08-11 19:09:47,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1245280.0, ans=0.125 2024-08-11 19:09:54,579 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8600, loss[loss=0.09804, beats_loss=0.009076, ecapa_loss=0.000219, whisper_loss=0.08678, over 20148.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01137, ecapa_loss=0.0001895, whisper_loss=0.09115, over 3856829.73 frames. ], batch size: 78, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:09:59,186 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2024-08-11 19:10:25,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1245580.0, ans=0.125 2024-08-11 19:10:27,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1245580.0, ans=0.05 2024-08-11 19:10:56,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1245780.0, ans=0.125 2024-08-11 19:11:01,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1245880.0, ans=0.1 2024-08-11 19:11:01,868 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8650, loss[loss=0.1058, beats_loss=0.007971, ecapa_loss=0.0002211, whisper_loss=0.0956, over 17108.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01137, ecapa_loss=0.0001899, whisper_loss=0.09183, over 3875634.33 frames. ], batch size: 65, lr: 6.98e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:11:03,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1245880.0, ans=0.125 2024-08-11 19:11:09,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1245880.0, ans=0.0 2024-08-11 19:11:16,678 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-08-11 19:11:18,099 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-11 19:11:26,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.702e+01 2.920e+01 3.348e+01 5.833e+01, threshold=5.840e+01, percent-clipped=0.0 2024-08-11 19:11:39,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1246080.0, ans=0.125 2024-08-11 19:11:51,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1246180.0, ans=0.0 2024-08-11 19:11:51,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1246180.0, ans=0.0 2024-08-11 19:12:00,297 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 31 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 19:12:01,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1246280.0, ans=0.0 2024-08-11 19:12:12,958 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8700, loss[loss=0.1237, beats_loss=0.01034, ecapa_loss=0.0001858, whisper_loss=0.1115, over 22330.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01128, ecapa_loss=0.0001906, whisper_loss=0.09242, over 3873127.14 frames. ], batch size: 88, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:12:40,367 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 19:12:53,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1246580.0, ans=0.125 2024-08-11 19:12:54,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1246580.0, ans=0.0 2024-08-11 19:12:56,067 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 19:12:57,832 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 19:13:16,593 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 19:13:31,534 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8750, loss[loss=0.1107, beats_loss=0.008954, ecapa_loss=0.0002046, whisper_loss=0.09966, over 19771.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01127, ecapa_loss=0.0001925, whisper_loss=0.09227, over 3868294.73 frames. ], batch size: 76, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:13:31,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1246880.0, ans=0.0 2024-08-11 19:13:33,112 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-11 19:13:36,824 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-08-11 19:14:02,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.729e+01 3.149e+01 3.725e+01 7.299e+01, threshold=6.297e+01, percent-clipped=2.0 2024-08-11 19:14:04,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1247080.0, ans=0.0 2024-08-11 19:14:07,527 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 19:14:12,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1247080.0, ans=0.025 2024-08-11 19:14:14,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1247080.0, ans=0.125 2024-08-11 19:14:32,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1247180.0, ans=0.1 2024-08-11 19:14:55,095 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 22 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 19:14:56,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8800, loss[loss=0.0972, beats_loss=0.01205, ecapa_loss=0.0002048, whisper_loss=0.0831, over 20229.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01122, ecapa_loss=0.0001933, whisper_loss=0.09306, over 3852962.32 frames. ], batch size: 82, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:14:58,102 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 19:14:59,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1247380.0, ans=0.125 2024-08-11 19:15:03,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1247380.0, ans=0.125 2024-08-11 19:15:08,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1247380.0, ans=0.2 2024-08-11 19:15:10,091 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 19:15:16,367 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 19:15:34,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1247580.0, ans=0.0 2024-08-11 19:15:44,272 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-11 19:15:57,764 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-11 19:16:05,067 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 19:16:15,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1247780.0, ans=0.0 2024-08-11 19:16:19,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2024-08-11 19:16:21,630 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8850, loss[loss=0.1028, beats_loss=0.01151, ecapa_loss=0.0001695, whisper_loss=0.08956, over 14753.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01126, ecapa_loss=0.0001911, whisper_loss=0.09283, over 3838575.90 frames. ], batch size: 59, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:16:45,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-08-11 19:16:49,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1247980.0, ans=0.125 2024-08-11 19:16:49,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1247980.0, ans=0.125 2024-08-11 19:16:52,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.673e+01 2.972e+01 3.544e+01 5.278e+01, threshold=5.944e+01, percent-clipped=0.0 2024-08-11 19:16:59,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1248080.0, ans=0.0 2024-08-11 19:17:01,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=1248080.0, ans=0.02 2024-08-11 19:17:20,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1248180.0, ans=0.125 2024-08-11 19:17:42,663 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.34 vs. limit=22.5 2024-08-11 19:17:47,794 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8900, loss[loss=0.1248, beats_loss=0.01255, ecapa_loss=0.000176, whisper_loss=0.1104, over 24209.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01122, ecapa_loss=0.0001913, whisper_loss=0.09324, over 3846170.55 frames. ], batch size: 93, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:17:50,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1248380.0, ans=0.125 2024-08-11 19:18:09,974 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 19:18:19,751 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 19:18:20,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1248480.0, ans=0.125 2024-08-11 19:18:35,510 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 19:18:36,683 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 19:18:46,155 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 20 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 19:18:53,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1248680.0, ans=0.125 2024-08-11 19:18:55,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1248680.0, ans=0.125 2024-08-11 19:19:14,188 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-11 19:19:14,941 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 8950, loss[loss=0.1054, beats_loss=0.01092, ecapa_loss=0.0002168, whisper_loss=0.09226, over 17260.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01124, ecapa_loss=0.0001905, whisper_loss=0.09296, over 3842691.61 frames. ], batch size: 68, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:19:17,185 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 11 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 19:19:24,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1248880.0, ans=0.125 2024-08-11 19:19:28,951 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-08-11 19:19:35,663 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 19:19:39,179 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2024-08-11 19:19:43,105 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 19:19:44,627 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.588e+01 3.053e+01 3.414e+01 5.392e+01, threshold=6.106e+01, percent-clipped=0.0 2024-08-11 19:20:06,347 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.63 vs. limit=15.0 2024-08-11 19:20:11,225 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.01 vs. limit=15.0 2024-08-11 19:20:16,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1249180.0, ans=0.125 2024-08-11 19:20:38,763 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9000, loss[loss=0.109, beats_loss=0.01145, ecapa_loss=0.0001914, whisper_loss=0.0956, over 22428.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01118, ecapa_loss=0.0001907, whisper_loss=0.09331, over 3855052.16 frames. ], batch size: 92, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:20:38,764 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 19:21:20,527 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on ASR_libri: loss=0.2565, beats_loss=0, ecapa_loss=0.0006239, whisper_loss=0.2503, over 922467.00 frames. 2024-08-11 19:21:39,165 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on SV_voxceleb1: loss=0.005312, beats_loss=0, ecapa_loss=0.0005312, whisper_loss=0, over 939242.00 frames. 2024-08-11 19:23:36,250 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on AT_audioset: loss=0.02491, beats_loss=0.02491, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 19:23:36,254 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 19:23:55,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1249480.0, ans=0.0 2024-08-11 19:23:57,254 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.10 vs. limit=22.5 2024-08-11 19:24:02,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-11 19:24:08,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1249580.0, ans=0.125 2024-08-11 19:24:31,311 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2024-08-11 19:24:38,838 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 19:25:00,850 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9050, loss[loss=0.1277, beats_loss=0.01009, ecapa_loss=0.0001869, whisper_loss=0.1157, over 21656.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01117, ecapa_loss=0.0001913, whisper_loss=0.09328, over 3877195.98 frames. ], batch size: 88, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:25:05,474 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=12.0 2024-08-11 19:25:10,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1249880.0, ans=0.125 2024-08-11 19:25:13,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1249880.0, ans=0.035 2024-08-11 19:25:22,772 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 19:25:32,601 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.548e+01 2.793e+01 3.280e+01 4.630e+01, threshold=5.586e+01, percent-clipped=0.0 2024-08-11 19:25:38,887 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2024-08-11 19:25:40,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1250080.0, ans=0.0 2024-08-11 19:25:45,392 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 19:26:07,037 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.775e+05 2024-08-11 19:26:22,507 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 19:26:24,345 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 19:26:26,927 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9100, loss[loss=0.1105, beats_loss=0.01005, ecapa_loss=0.0001628, whisper_loss=0.09883, over 21860.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01116, ecapa_loss=0.0001897, whisper_loss=0.09303, over 3885229.62 frames. ], batch size: 82, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:26:42,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1250380.0, ans=0.0 2024-08-11 19:26:48,575 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-11 19:26:56,356 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 19:27:12,782 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-11 19:27:26,918 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.203e-03 2024-08-11 19:27:39,899 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.38 vs. limit=15.0 2024-08-11 19:27:44,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1250780.0, ans=0.0 2024-08-11 19:27:45,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1250780.0, ans=0.2 2024-08-11 19:27:48,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1250780.0, ans=0.1 2024-08-11 19:27:52,956 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9150, loss[loss=0.1234, beats_loss=0.01027, ecapa_loss=0.000239, whisper_loss=0.1107, over 22307.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01119, ecapa_loss=0.0001899, whisper_loss=0.09251, over 3869556.51 frames. ], batch size: 92, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:27:53,086 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-11 19:27:56,550 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 19:27:58,976 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2024-08-11 19:28:02,486 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-11 19:28:16,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1250980.0, ans=0.2 2024-08-11 19:28:23,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.605e+01 2.841e+01 3.221e+01 5.369e+01, threshold=5.683e+01, percent-clipped=0.0 2024-08-11 19:28:47,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1251180.0, ans=0.125 2024-08-11 19:29:01,457 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.42 vs. limit=15.0 2024-08-11 19:29:03,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1251280.0, ans=0.1 2024-08-11 19:29:12,741 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9200, loss[loss=0.1129, beats_loss=0.01207, ecapa_loss=0.0002157, whisper_loss=0.09868, over 21992.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01121, ecapa_loss=0.0001911, whisper_loss=0.09226, over 3866566.82 frames. ], batch size: 90, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:29:17,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1251380.0, ans=0.0 2024-08-11 19:29:32,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1251480.0, ans=0.2 2024-08-11 19:29:42,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1251580.0, ans=0.0 2024-08-11 19:30:28,594 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9250, loss[loss=0.09484, beats_loss=0.01298, ecapa_loss=0.000204, whisper_loss=0.07982, over 21673.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0113, ecapa_loss=0.0001894, whisper_loss=0.09156, over 3848149.57 frames. ], batch size: 89, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:30:32,107 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 19:30:36,981 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2024-08-11 19:30:37,262 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.17 vs. limit=15.0 2024-08-11 19:30:57,039 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.691e+01 2.985e+01 3.626e+01 6.428e+01, threshold=5.970e+01, percent-clipped=0.0 2024-08-11 19:31:21,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1252180.0, ans=0.0 2024-08-11 19:31:22,122 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-11 19:31:26,757 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 37 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 19:31:30,604 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-08-11 19:31:44,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1252280.0, ans=0.125 2024-08-11 19:31:45,307 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 19:31:46,364 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9300, loss[loss=0.1247, beats_loss=0.01003, ecapa_loss=0.0001981, whisper_loss=0.1127, over 22177.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0113, ecapa_loss=0.0001898, whisper_loss=0.09223, over 3888720.55 frames. ], batch size: 89, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:31:46,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1252380.0, ans=0.125 2024-08-11 19:31:50,287 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2024-08-11 19:31:57,435 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 19:32:09,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1252480.0, ans=0.1 2024-08-11 19:32:13,058 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 19:32:23,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1252580.0, ans=0.125 2024-08-11 19:32:26,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1252580.0, ans=0.125 2024-08-11 19:32:29,491 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 10 from Vox, 38 fro AS 2024-08-11 19:32:49,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1252780.0, ans=0.04949747468305833 2024-08-11 19:32:52,926 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 19:32:58,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1252780.0, ans=0.0 2024-08-11 19:33:05,237 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9350, loss[loss=0.1023, beats_loss=0.01129, ecapa_loss=0.0001962, whisper_loss=0.08907, over 16185.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01132, ecapa_loss=0.0001886, whisper_loss=0.09222, over 3867408.38 frames. ], batch size: 64, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:33:07,296 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.933e-01 2024-08-11 19:33:10,949 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 19:33:35,107 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.606e+01 3.008e+01 3.444e+01 5.189e+01, threshold=6.015e+01, percent-clipped=1.0 2024-08-11 19:34:13,315 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 19:34:18,284 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-11 19:34:22,644 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9400, loss[loss=0.0888, beats_loss=0.01468, ecapa_loss=0.0001679, whisper_loss=0.07245, over 22519.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01137, ecapa_loss=0.0001903, whisper_loss=0.09167, over 3880573.20 frames. ], batch size: 92, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:34:40,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1253480.0, ans=0.2 2024-08-11 19:34:44,231 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-11 19:34:51,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1253580.0, ans=0.125 2024-08-11 19:35:04,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1253580.0, ans=0.0 2024-08-11 19:35:08,804 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.69 vs. limit=22.5 2024-08-11 19:35:23,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1253780.0, ans=0.125 2024-08-11 19:35:28,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1253780.0, ans=0.125 2024-08-11 19:35:31,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1253780.0, ans=0.125 2024-08-11 19:35:36,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1253880.0, ans=0.125 2024-08-11 19:35:37,019 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9450, loss[loss=0.1109, beats_loss=0.008459, ecapa_loss=0.0002794, whisper_loss=0.0996, over 16658.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01132, ecapa_loss=0.0001903, whisper_loss=0.09177, over 3856568.77 frames. ], batch size: 69, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:35:53,029 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 19:35:53,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1253980.0, ans=0.125 2024-08-11 19:35:55,844 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 19:36:01,808 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.693e+01 3.099e+01 3.778e+01 6.565e+01, threshold=6.199e+01, percent-clipped=1.0 2024-08-11 19:36:14,352 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 19:36:14,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2024-08-11 19:36:16,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1254180.0, ans=0.07 2024-08-11 19:36:17,525 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.31 vs. limit=22.5 2024-08-11 19:36:43,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9500, loss[loss=0.1214, beats_loss=0.009487, ecapa_loss=0.0001765, whisper_loss=0.1101, over 24027.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01135, ecapa_loss=0.0001923, whisper_loss=0.09142, over 3891210.73 frames. ], batch size: 93, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:37:01,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1254480.0, ans=0.0 2024-08-11 19:37:13,896 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 19:37:24,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1254680.0, ans=0.0 2024-08-11 19:37:24,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1254680.0, ans=0.0 2024-08-11 19:37:41,925 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.335e+05 2024-08-11 19:37:49,037 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9550, loss[loss=0.1095, beats_loss=0.01249, ecapa_loss=0.0002291, whisper_loss=0.09471, over 16803.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01131, ecapa_loss=0.0001928, whisper_loss=0.09139, over 3858100.64 frames. ], batch size: 70, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:37:59,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=1254880.0, ans=0.2 2024-08-11 19:38:01,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1254980.0, ans=0.125 2024-08-11 19:38:04,769 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 19:38:13,915 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.508e+01 2.726e+01 3.017e+01 8.338e+01, threshold=5.453e+01, percent-clipped=1.0 2024-08-11 19:38:36,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1255180.0, ans=0.07 2024-08-11 19:38:54,655 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9600, loss[loss=0.1119, beats_loss=0.01011, ecapa_loss=0.0001796, whisper_loss=0.09996, over 18946.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01122, ecapa_loss=0.0001916, whisper_loss=0.09202, over 3823062.70 frames. ], batch size: 71, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:39:03,100 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 19 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-11 19:39:14,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1255480.0, ans=0.125 2024-08-11 19:39:19,328 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 19:39:29,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1255580.0, ans=0.125 2024-08-11 19:39:30,336 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 19:39:37,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1255680.0, ans=0.0 2024-08-11 19:39:38,209 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 15 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 19:39:38,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2024-08-11 19:39:51,479 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-11 19:39:54,972 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-11 19:40:01,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1255880.0, ans=0.1 2024-08-11 19:40:02,025 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9650, loss[loss=0.08093, beats_loss=0.01249, ecapa_loss=0.0002125, whisper_loss=0.06632, over 18506.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01123, ecapa_loss=0.0001928, whisper_loss=0.0917, over 3812773.92 frames. ], batch size: 80, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:40:05,078 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 19:40:17,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1255980.0, ans=0.125 2024-08-11 19:40:27,789 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.826e+01 3.085e+01 3.592e+01 1.036e+02, threshold=6.169e+01, percent-clipped=1.0 2024-08-11 19:40:29,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1256080.0, ans=0.125 2024-08-11 19:40:47,623 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 19:40:53,004 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 33 from Vox, 28 fro AS 2024-08-11 19:41:00,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=1256280.0, ans=0.1 2024-08-11 19:41:02,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1256280.0, ans=0.125 2024-08-11 19:41:08,990 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9700, loss[loss=0.1117, beats_loss=0.01034, ecapa_loss=0.0001649, whisper_loss=0.09971, over 22860.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01121, ecapa_loss=0.0001923, whisper_loss=0.09181, over 3855544.85 frames. ], batch size: 90, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:41:09,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1256380.0, ans=0.07 2024-08-11 19:41:09,697 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-08-11 19:41:40,031 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2024-08-11 19:41:52,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1256680.0, ans=0.0 2024-08-11 19:42:00,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1256780.0, ans=0.0 2024-08-11 19:42:02,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1256780.0, ans=0.125 2024-08-11 19:42:03,186 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 19:42:10,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1256780.0, ans=0.0 2024-08-11 19:42:14,844 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9750, loss[loss=0.1186, beats_loss=0.006979, ecapa_loss=0.000206, whisper_loss=0.1095, over 14566.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01114, ecapa_loss=0.0001928, whisper_loss=0.09201, over 3845713.27 frames. ], batch size: 54, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:42:20,591 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 19:42:36,614 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-11 19:42:39,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1256980.0, ans=0.125 2024-08-11 19:42:39,944 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.90 vs. limit=10.0 2024-08-11 19:42:40,368 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.576e+01 2.817e+01 3.279e+01 5.572e+01, threshold=5.633e+01, percent-clipped=0.0 2024-08-11 19:42:41,929 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 12 from Vox, 45 fro AS 2024-08-11 19:42:44,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1257080.0, ans=0.035 2024-08-11 19:42:49,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1257080.0, ans=0.0 2024-08-11 19:42:52,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1257080.0, ans=0.0 2024-08-11 19:42:53,482 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 19:42:53,891 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-11 19:42:58,664 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:43:21,179 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9800, loss[loss=0.08924, beats_loss=0.01208, ecapa_loss=0.0002014, whisper_loss=0.07514, over 19181.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01124, ecapa_loss=0.0001914, whisper_loss=0.09149, over 3867649.35 frames. ], batch size: 81, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:43:38,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1257480.0, ans=0.04949747468305833 2024-08-11 19:43:41,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1257480.0, ans=0.07 2024-08-11 19:43:42,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1257480.0, ans=0.125 2024-08-11 19:43:45,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1257580.0, ans=0.1 2024-08-11 19:43:49,280 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.46 vs. limit=15.0 2024-08-11 19:44:26,434 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9850, loss[loss=0.1146, beats_loss=0.01158, ecapa_loss=0.0002189, whisper_loss=0.1008, over 21533.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01132, ecapa_loss=0.0001913, whisper_loss=0.09125, over 3860561.57 frames. ], batch size: 88, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:44:26,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1257880.0, ans=0.2 2024-08-11 19:44:34,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1257880.0, ans=0.125 2024-08-11 19:44:38,043 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 19:44:38,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1257980.0, ans=0.125 2024-08-11 19:44:41,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1257980.0, ans=0.125 2024-08-11 19:44:51,540 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.685e+01 3.037e+01 3.617e+01 4.839e+01, threshold=6.074e+01, percent-clipped=0.0 2024-08-11 19:45:12,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1258180.0, ans=0.09899494936611666 2024-08-11 19:45:16,430 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 19:45:31,849 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9900, loss[loss=0.08896, beats_loss=0.01346, ecapa_loss=0.000233, whisper_loss=0.07316, over 20616.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01128, ecapa_loss=0.0001916, whisper_loss=0.09192, over 3900249.42 frames. ], batch size: 89, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:45:43,722 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 19:45:48,002 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 19:45:54,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1258480.0, ans=0.125 2024-08-11 19:46:12,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1258680.0, ans=0.125 2024-08-11 19:46:12,774 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2024-08-11 19:46:25,348 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-11 19:46:28,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1258780.0, ans=0.125 2024-08-11 19:46:31,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1258780.0, ans=0.125 2024-08-11 19:46:35,290 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 19:46:36,492 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 9950, loss[loss=0.1023, beats_loss=0.01256, ecapa_loss=0.0001843, whisper_loss=0.0879, over 19252.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01126, ecapa_loss=0.0001919, whisper_loss=0.09154, over 3867582.44 frames. ], batch size: 77, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:46:41,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1258880.0, ans=0.0 2024-08-11 19:46:46,120 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 19:46:48,629 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 26 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-11 19:46:50,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1258980.0, ans=0.125 2024-08-11 19:47:01,985 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.542e+01 2.819e+01 3.280e+01 8.897e+01, threshold=5.637e+01, percent-clipped=1.0 2024-08-11 19:47:05,880 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 19:47:42,649 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10000, loss[loss=0.09222, beats_loss=0.01087, ecapa_loss=0.0002041, whisper_loss=0.07931, over 15480.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01111, ecapa_loss=0.0001937, whisper_loss=0.09286, over 3858241.49 frames. ], batch size: 59, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:47:44,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1259380.0, ans=0.5 2024-08-11 19:47:45,437 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 19:47:49,228 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 19:48:01,449 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 33 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 19:48:16,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1259580.0, ans=0.125 2024-08-11 19:48:16,961 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 19:48:19,547 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 19:48:23,593 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 19:48:35,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1259780.0, ans=0.0 2024-08-11 19:48:36,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1259780.0, ans=0.125 2024-08-11 19:48:40,089 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 19:48:47,966 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10050, loss[loss=0.1064, beats_loss=0.01211, ecapa_loss=0.0001609, whisper_loss=0.09268, over 23231.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01121, ecapa_loss=0.0001929, whisper_loss=0.09238, over 3864111.94 frames. ], batch size: 91, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:49:02,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1259980.0, ans=0.125 2024-08-11 19:49:03,518 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 33 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 19:49:11,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1259980.0, ans=0.04949747468305833 2024-08-11 19:49:12,417 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.712e+01 3.023e+01 3.510e+01 5.543e+01, threshold=6.045e+01, percent-clipped=0.0 2024-08-11 19:49:17,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1260080.0, ans=0.2 2024-08-11 19:49:17,939 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 19:49:25,833 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 12 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 19:49:46,935 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 19:49:52,924 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10100, loss[loss=0.1145, beats_loss=0.01188, ecapa_loss=0.0001901, whisper_loss=0.1008, over 22534.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01117, ecapa_loss=0.0001932, whisper_loss=0.09307, over 3893694.47 frames. ], batch size: 93, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:49:57,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1260380.0, ans=0.0 2024-08-11 19:50:00,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1260380.0, ans=0.1 2024-08-11 19:50:01,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1260380.0, ans=0.1 2024-08-11 19:50:13,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1260480.0, ans=0.0 2024-08-11 19:50:16,612 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 19:50:37,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1260680.0, ans=0.2 2024-08-11 19:50:42,039 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-08-11 19:50:52,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1260780.0, ans=0.0 2024-08-11 19:50:58,148 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10150, loss[loss=0.1069, beats_loss=0.01268, ecapa_loss=0.0001822, whisper_loss=0.09244, over 22217.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01123, ecapa_loss=0.0001939, whisper_loss=0.09219, over 3866565.07 frames. ], batch size: 90, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:51:06,058 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 19:51:09,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1260880.0, ans=0.07 2024-08-11 19:51:23,121 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.653e+01 2.999e+01 3.558e+01 5.617e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 19:51:40,034 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 19:51:44,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1261180.0, ans=0.125 2024-08-11 19:51:56,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1261280.0, ans=0.1 2024-08-11 19:52:02,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.36 vs. limit=15.0 2024-08-11 19:52:03,877 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10200, loss[loss=0.1116, beats_loss=0.008871, ecapa_loss=0.0001622, whisper_loss=0.1011, over 20297.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01123, ecapa_loss=0.0001928, whisper_loss=0.09236, over 3893529.63 frames. ], batch size: 76, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:52:08,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1261380.0, ans=0.125 2024-08-11 19:52:15,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1261480.0, ans=0.0 2024-08-11 19:52:22,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=1261480.0, ans=0.1 2024-08-11 19:52:50,053 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 19:53:00,472 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.074e-01 2024-08-11 19:53:03,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1261780.0, ans=0.0 2024-08-11 19:53:03,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1261780.0, ans=0.2 2024-08-11 19:53:04,027 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 19:53:09,013 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10250, loss[loss=0.1132, beats_loss=0.009274, ecapa_loss=0.0002017, whisper_loss=0.1019, over 15190.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01123, ecapa_loss=0.0001921, whisper_loss=0.09241, over 3883862.59 frames. ], batch size: 57, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:53:13,642 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=12.0 2024-08-11 19:53:14,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1261880.0, ans=0.125 2024-08-11 19:53:23,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1261980.0, ans=0.0 2024-08-11 19:53:33,861 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.624e+01 2.927e+01 3.242e+01 1.065e+02, threshold=5.855e+01, percent-clipped=3.0 2024-08-11 19:53:35,875 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 19:53:44,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-11 19:53:47,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1262180.0, ans=0.1 2024-08-11 19:54:04,175 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.31 vs. limit=15.0 2024-08-11 19:54:10,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1262280.0, ans=0.125 2024-08-11 19:54:10,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=1262280.0, ans=22.5 2024-08-11 19:54:12,436 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=22.5 2024-08-11 19:54:15,341 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10300, loss[loss=0.1074, beats_loss=0.01174, ecapa_loss=0.0002024, whisper_loss=0.09363, over 22578.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01131, ecapa_loss=0.0001913, whisper_loss=0.09159, over 3889512.00 frames. ], batch size: 91, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:54:47,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1262580.0, ans=0.1 2024-08-11 19:55:00,336 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-11 19:55:01,614 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 19:55:07,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1262780.0, ans=0.1 2024-08-11 19:55:20,869 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10350, loss[loss=0.07425, beats_loss=0.01458, ecapa_loss=0.0001743, whisper_loss=0.05793, over 22025.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01134, ecapa_loss=0.0001919, whisper_loss=0.09183, over 3914240.41 frames. ], batch size: 92, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:55:30,971 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.60 vs. limit=5.0 2024-08-11 19:55:45,852 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.735e+01 3.032e+01 3.459e+01 9.732e+01, threshold=6.064e+01, percent-clipped=1.0 2024-08-11 19:55:51,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=1263080.0, ans=0.1 2024-08-11 19:55:53,141 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.37 vs. limit=12.0 2024-08-11 19:55:54,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1263080.0, ans=0.125 2024-08-11 19:56:03,618 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-11 19:56:07,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1263180.0, ans=0.2 2024-08-11 19:56:09,765 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-11 19:56:15,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1263280.0, ans=0.125 2024-08-11 19:56:17,383 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.45 vs. limit=15.0 2024-08-11 19:56:25,929 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.25 vs. limit=10.0 2024-08-11 19:56:26,462 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10400, loss[loss=0.1062, beats_loss=0.01272, ecapa_loss=0.0001298, whisper_loss=0.09217, over 23124.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01131, ecapa_loss=0.0001916, whisper_loss=0.09188, over 3921131.54 frames. ], batch size: 88, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:56:32,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1263380.0, ans=0.125 2024-08-11 19:56:37,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1263380.0, ans=0.1 2024-08-11 19:57:00,348 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 19:57:20,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1263780.0, ans=0.0 2024-08-11 19:57:23,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1263780.0, ans=0.125 2024-08-11 19:57:26,963 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 19:57:31,797 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10450, loss[loss=0.125, beats_loss=0.008599, ecapa_loss=0.0002177, whisper_loss=0.1142, over 23222.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01124, ecapa_loss=0.0001916, whisper_loss=0.0922, over 3871641.29 frames. ], batch size: 89, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:57:54,083 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 19:57:56,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.575e+01 2.883e+01 3.290e+01 7.177e+01, threshold=5.767e+01, percent-clipped=1.0 2024-08-11 19:57:56,840 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 19:58:20,315 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 19:58:36,976 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10500, loss[loss=0.08588, beats_loss=0.01369, ecapa_loss=0.0001489, whisper_loss=0.0707, over 22303.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01121, ecapa_loss=0.0001928, whisper_loss=0.09243, over 3899808.31 frames. ], batch size: 90, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:59:10,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1264580.0, ans=0.125 2024-08-11 19:59:13,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1264580.0, ans=0.0 2024-08-11 19:59:17,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1264680.0, ans=0.07 2024-08-11 19:59:20,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-11 19:59:23,944 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:59:26,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1264680.0, ans=0.0 2024-08-11 19:59:35,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1264780.0, ans=0.125 2024-08-11 19:59:40,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1264780.0, ans=0.07 2024-08-11 19:59:40,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1264780.0, ans=0.0 2024-08-11 19:59:43,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1264880.0, ans=0.125 2024-08-11 19:59:43,875 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10550, loss[loss=0.1183, beats_loss=0.01005, ecapa_loss=0.0001928, whisper_loss=0.1064, over 14118.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01129, ecapa_loss=0.0001923, whisper_loss=0.0915, over 3859874.51 frames. ], batch size: 54, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:59:50,087 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-11 19:59:53,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1264880.0, ans=0.0 2024-08-11 19:59:53,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.78 vs. limit=22.5 2024-08-11 19:59:57,976 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 20:00:06,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1264980.0, ans=0.0 2024-08-11 20:00:08,117 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.610e+01 2.840e+01 3.443e+01 6.303e+01, threshold=5.679e+01, percent-clipped=1.0 2024-08-11 20:00:27,581 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.692e+00 2024-08-11 20:00:34,635 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.50 vs. limit=15.0 2024-08-11 20:00:49,444 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10600, loss[loss=0.1306, beats_loss=0.009214, ecapa_loss=0.0001875, whisper_loss=0.1195, over 24049.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01127, ecapa_loss=0.0001923, whisper_loss=0.09155, over 3858401.87 frames. ], batch size: 94, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:01:12,989 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 20:01:52,305 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-11 20:01:55,629 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10650, loss[loss=0.1076, beats_loss=0.01112, ecapa_loss=0.0001836, whisper_loss=0.09463, over 16237.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01128, ecapa_loss=0.0001916, whisper_loss=0.09174, over 3865008.43 frames. ], batch size: 65, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:01:56,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1265880.0, ans=0.0 2024-08-11 20:02:06,597 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 20:02:09,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1265980.0, ans=0.0 2024-08-11 20:02:12,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1265980.0, ans=0.1 2024-08-11 20:02:17,940 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.64 vs. limit=22.5 2024-08-11 20:02:21,104 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.841e+01 3.157e+01 3.812e+01 6.518e+01, threshold=6.314e+01, percent-clipped=4.0 2024-08-11 20:02:24,268 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 20:02:28,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1266080.0, ans=0.125 2024-08-11 20:02:42,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1266180.0, ans=0.0 2024-08-11 20:03:02,765 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10700, loss[loss=0.1035, beats_loss=0.01164, ecapa_loss=0.0001689, whisper_loss=0.09013, over 21774.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01128, ecapa_loss=0.0001919, whisper_loss=0.09218, over 3875514.01 frames. ], batch size: 89, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:03:10,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1266380.0, ans=0.125 2024-08-11 20:03:16,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1266480.0, ans=0.125 2024-08-11 20:03:23,664 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-08-11 20:03:25,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1266480.0, ans=0.025 2024-08-11 20:03:50,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-08-11 20:03:54,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1266680.0, ans=0.0 2024-08-11 20:04:09,654 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10750, loss[loss=0.1221, beats_loss=0.007821, ecapa_loss=0.00021, whisper_loss=0.1122, over 17512.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01126, ecapa_loss=0.000193, whisper_loss=0.09259, over 3876206.43 frames. ], batch size: 69, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:04:16,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1266880.0, ans=0.125 2024-08-11 20:04:25,656 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 24 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-11 20:04:25,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1266980.0, ans=0.125 2024-08-11 20:04:26,401 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.43 vs. limit=10.0 2024-08-11 20:04:35,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1266980.0, ans=0.125 2024-08-11 20:04:36,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.628e+01 2.928e+01 3.321e+01 7.388e+01, threshold=5.856e+01, percent-clipped=1.0 2024-08-11 20:04:39,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1267080.0, ans=0.125 2024-08-11 20:04:39,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1267080.0, ans=0.0 2024-08-11 20:04:42,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1267080.0, ans=0.125 2024-08-11 20:04:48,023 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 20:04:53,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1267180.0, ans=0.0 2024-08-11 20:05:06,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1267280.0, ans=0.2 2024-08-11 20:05:19,872 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10800, loss[loss=0.1254, beats_loss=0.008903, ecapa_loss=0.0001919, whisper_loss=0.1146, over 23721.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01126, ecapa_loss=0.0001924, whisper_loss=0.09267, over 3880596.00 frames. ], batch size: 93, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:05:33,995 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 38 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 20:05:45,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1267480.0, ans=0.2 2024-08-11 20:05:46,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=12.0 2024-08-11 20:06:25,798 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 10 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 20:06:26,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1267780.0, ans=0.125 2024-08-11 20:06:30,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1267780.0, ans=0.125 2024-08-11 20:06:35,630 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10850, loss[loss=0.1036, beats_loss=0.01156, ecapa_loss=0.0001587, whisper_loss=0.09042, over 22873.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01124, ecapa_loss=0.0001925, whisper_loss=0.09288, over 3901757.25 frames. ], batch size: 88, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:07:05,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.647e+01 2.915e+01 3.241e+01 5.191e+01, threshold=5.831e+01, percent-clipped=0.0 2024-08-11 20:07:10,649 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 20:07:33,257 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.06 vs. limit=12.0 2024-08-11 20:07:36,716 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 20:07:48,245 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 38 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 20:07:53,819 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10900, loss[loss=0.1059, beats_loss=0.009297, ecapa_loss=0.0002576, whisper_loss=0.09404, over 21767.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01127, ecapa_loss=0.0001914, whisper_loss=0.09274, over 3918374.40 frames. ], batch size: 91, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:08:02,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1268380.0, ans=15.0 2024-08-11 20:08:05,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1268380.0, ans=0.1 2024-08-11 20:08:37,694 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 20:09:13,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1268780.0, ans=0.125 2024-08-11 20:09:19,564 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 10950, loss[loss=0.1142, beats_loss=0.0106, ecapa_loss=0.0001757, whisper_loss=0.1019, over 22661.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01126, ecapa_loss=0.0001914, whisper_loss=0.0932, over 3918217.03 frames. ], batch size: 88, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:09:23,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1268880.0, ans=0.1 2024-08-11 20:09:34,048 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 20:09:37,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1268880.0, ans=0.125 2024-08-11 20:10:01,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.627e+01 3.007e+01 3.464e+01 1.236e+02, threshold=6.014e+01, percent-clipped=3.0 2024-08-11 20:10:03,963 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-08-11 20:10:07,297 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 20:10:07,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1269080.0, ans=0.125 2024-08-11 20:10:28,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1269180.0, ans=0.5 2024-08-11 20:10:39,482 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 20:10:51,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1269280.0, ans=0.0 2024-08-11 20:10:56,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1269280.0, ans=0.125 2024-08-11 20:11:02,679 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2024-08-11 20:11:08,351 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11000, loss[loss=0.1405, beats_loss=0.00933, ecapa_loss=0.000232, whisper_loss=0.1289, over 22380.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01121, ecapa_loss=0.0001927, whisper_loss=0.09346, over 3942423.45 frames. ], batch size: 89, lr: 6.92e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:11:21,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1269380.0, ans=0.1 2024-08-11 20:11:22,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1269380.0, ans=0.0 2024-08-11 20:11:48,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1269580.0, ans=0.125 2024-08-11 20:12:02,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1269680.0, ans=0.0 2024-08-11 20:12:20,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1269680.0, ans=0.0 2024-08-11 20:12:33,299 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2024-08-11 20:12:38,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1269780.0, ans=0.0 2024-08-11 20:12:49,487 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11050, loss[loss=0.1065, beats_loss=0.009878, ecapa_loss=0.0001918, whisper_loss=0.09467, over 21864.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01123, ecapa_loss=0.0001912, whisper_loss=0.09346, over 3963738.72 frames. ], batch size: 88, lr: 6.92e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:13:07,096 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 20:13:11,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1269980.0, ans=0.125 2024-08-11 20:13:27,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1269980.0, ans=0.0 2024-08-11 20:13:33,745 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.512e+01 2.845e+01 3.437e+01 6.269e+01, threshold=5.689e+01, percent-clipped=1.0 2024-08-11 20:13:43,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1270080.0, ans=0.0 2024-08-11 20:13:46,085 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 20:14:09,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1270180.0, ans=0.125 2024-08-11 20:14:25,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1270280.0, ans=0.0 2024-08-11 20:14:31,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1270280.0, ans=0.1 2024-08-11 20:14:39,566 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 34 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 20:14:46,160 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11100, loss[loss=0.1272, beats_loss=0.007156, ecapa_loss=0.0002349, whisper_loss=0.1177, over 14082.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01116, ecapa_loss=0.0001923, whisper_loss=0.09294, over 3943680.07 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:15:32,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1270480.0, ans=0.125 2024-08-11 20:15:32,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1270480.0, ans=0.1 2024-08-11 20:15:46,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1270580.0, ans=0.2 2024-08-11 20:15:56,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1270580.0, ans=0.125 2024-08-11 20:16:03,036 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 20:16:18,599 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2024-08-11 20:16:21,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1270680.0, ans=0.125 2024-08-11 20:16:33,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1270780.0, ans=0.125 2024-08-11 20:16:47,629 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11150, loss[loss=0.1266, beats_loss=0.01015, ecapa_loss=0.0001766, whisper_loss=0.1147, over 23223.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01111, ecapa_loss=0.0001912, whisper_loss=0.094, over 3939693.75 frames. ], batch size: 91, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:16:55,200 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 20:17:00,587 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.539e-01 2024-08-11 20:17:08,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1270880.0, ans=0.07 2024-08-11 20:17:31,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1270980.0, ans=0.07 2024-08-11 20:17:39,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.507e+01 2.811e+01 3.221e+01 4.609e+01, threshold=5.623e+01, percent-clipped=0.0 2024-08-11 20:17:48,934 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.72 vs. limit=15.0 2024-08-11 20:17:54,270 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 20:18:06,024 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 20:18:10,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1271180.0, ans=0.1 2024-08-11 20:18:21,438 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 20:18:21,680 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 20:18:34,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1271280.0, ans=0.0 2024-08-11 20:18:35,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1271380.0, ans=0.125 2024-08-11 20:18:36,889 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11200, loss[loss=0.1057, beats_loss=0.01087, ecapa_loss=0.0001935, whisper_loss=0.0929, over 22359.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01107, ecapa_loss=0.0001912, whisper_loss=0.0941, over 3942826.69 frames. ], batch size: 92, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:19:25,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1271580.0, ans=0.0 2024-08-11 20:19:36,290 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 20:19:47,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1271780.0, ans=0.0 2024-08-11 20:19:58,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1271780.0, ans=0.0 2024-08-11 20:20:05,102 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11250, loss[loss=0.08601, beats_loss=0.01129, ecapa_loss=0.000178, whisper_loss=0.07293, over 17978.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01108, ecapa_loss=0.0001913, whisper_loss=0.0942, over 3873656.54 frames. ], batch size: 73, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:20:38,137 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.603e+01 2.926e+01 3.414e+01 6.111e+01, threshold=5.851e+01, percent-clipped=1.0 2024-08-11 20:20:38,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1272080.0, ans=0.0 2024-08-11 20:21:25,913 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=15.0 2024-08-11 20:21:27,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1272280.0, ans=0.1 2024-08-11 20:21:30,620 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-11 20:21:32,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1272380.0, ans=0.125 2024-08-11 20:21:34,352 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11300, loss[loss=0.1066, beats_loss=0.01283, ecapa_loss=0.0001996, whisper_loss=0.09176, over 20460.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01101, ecapa_loss=0.0001919, whisper_loss=0.09395, over 3866348.03 frames. ], batch size: 84, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:22:25,136 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2024-08-11 20:23:02,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1272780.0, ans=0.125 2024-08-11 20:23:04,927 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11350, loss[loss=0.09534, beats_loss=0.01125, ecapa_loss=0.0001944, whisper_loss=0.08215, over 17440.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0109, ecapa_loss=0.0001939, whisper_loss=0.09477, over 3895777.10 frames. ], batch size: 73, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:23:25,138 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-11 20:23:36,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1272980.0, ans=0.125 2024-08-11 20:23:39,746 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.545e+01 2.892e+01 3.550e+01 1.179e+02, threshold=5.785e+01, percent-clipped=1.0 2024-08-11 20:23:41,268 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 20:23:44,307 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=22.5 2024-08-11 20:23:50,545 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.41 vs. limit=22.5 2024-08-11 20:24:25,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1273280.0, ans=0.0 2024-08-11 20:24:26,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1273280.0, ans=0.125 2024-08-11 20:24:35,039 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11400, loss[loss=0.1173, beats_loss=0.008464, ecapa_loss=0.0001854, whisper_loss=0.107, over 14836.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01096, ecapa_loss=0.0001921, whisper_loss=0.09421, over 3872055.54 frames. ], batch size: 58, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:24:50,525 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.127e-01 2024-08-11 20:25:46,079 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 20:25:57,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1273780.0, ans=0.125 2024-08-11 20:26:03,879 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11450, loss[loss=0.1108, beats_loss=0.01002, ecapa_loss=0.0002055, whisper_loss=0.09877, over 21876.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01107, ecapa_loss=0.0001926, whisper_loss=0.09393, over 3908019.08 frames. ], batch size: 89, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:26:08,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1273880.0, ans=0.125 2024-08-11 20:26:38,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.725e+01 3.153e+01 3.598e+01 9.857e+01, threshold=6.305e+01, percent-clipped=2.0 2024-08-11 20:26:50,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-08-11 20:27:00,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1274180.0, ans=0.0 2024-08-11 20:27:14,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1274280.0, ans=0.0 2024-08-11 20:27:17,671 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 20:27:23,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=1274280.0, ans=12.0 2024-08-11 20:27:29,605 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 20:27:32,999 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11500, loss[loss=0.08845, beats_loss=0.01355, ecapa_loss=0.0001994, whisper_loss=0.0729, over 16774.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01104, ecapa_loss=0.0001917, whisper_loss=0.09457, over 3896359.23 frames. ], batch size: 67, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:27:33,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1274380.0, ans=0.125 2024-08-11 20:27:52,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1274480.0, ans=0.0 2024-08-11 20:27:54,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1274480.0, ans=15.0 2024-08-11 20:28:12,498 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 20:28:33,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1274680.0, ans=0.025 2024-08-11 20:29:07,104 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11550, loss[loss=0.1087, beats_loss=0.0119, ecapa_loss=0.000181, whisper_loss=0.09498, over 22213.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01106, ecapa_loss=0.000191, whisper_loss=0.09452, over 3885408.86 frames. ], batch size: 89, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:29:10,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1274880.0, ans=0.2 2024-08-11 20:29:14,975 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 20:29:43,055 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.690e+01 2.944e+01 3.463e+01 4.757e+01, threshold=5.888e+01, percent-clipped=0.0 2024-08-11 20:29:51,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1275080.0, ans=0.0 2024-08-11 20:29:53,691 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 20:29:54,416 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=15.0 2024-08-11 20:30:15,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1275180.0, ans=0.125 2024-08-11 20:30:25,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1275280.0, ans=0.125 2024-08-11 20:30:37,991 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11600, loss[loss=0.1087, beats_loss=0.01021, ecapa_loss=0.0002363, whisper_loss=0.09608, over 20717.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01101, ecapa_loss=0.0001905, whisper_loss=0.09468, over 3871007.84 frames. ], batch size: 88, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:30:38,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1275380.0, ans=0.2 2024-08-11 20:30:43,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1275380.0, ans=0.125 2024-08-11 20:30:56,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1275480.0, ans=0.0 2024-08-11 20:30:58,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1275480.0, ans=0.5 2024-08-11 20:31:28,461 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-11 20:31:28,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1275680.0, ans=0.1 2024-08-11 20:31:35,104 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 29 from LS+wenet, 8 from Vox, 31 fro AS 2024-08-11 20:31:44,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1275680.0, ans=0.125 2024-08-11 20:31:46,838 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.75 vs. limit=22.5 2024-08-11 20:31:54,280 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2024-08-11 20:31:59,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1275780.0, ans=0.0 2024-08-11 20:32:06,352 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11650, loss[loss=0.09947, beats_loss=0.01151, ecapa_loss=0.0001474, whisper_loss=0.08649, over 19748.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01112, ecapa_loss=0.0001908, whisper_loss=0.09443, over 3882037.96 frames. ], batch size: 75, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:32:22,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.39 vs. limit=22.5 2024-08-11 20:32:38,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1275980.0, ans=0.125 2024-08-11 20:32:44,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.564e+01 2.809e+01 3.170e+01 4.570e+01, threshold=5.617e+01, percent-clipped=0.0 2024-08-11 20:32:44,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1276080.0, ans=0.125 2024-08-11 20:32:46,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1276080.0, ans=0.125 2024-08-11 20:32:59,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1276080.0, ans=0.125 2024-08-11 20:33:05,789 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 20:33:09,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1276180.0, ans=0.1 2024-08-11 20:33:13,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1276180.0, ans=0.0 2024-08-11 20:33:39,826 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 20:33:43,258 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11700, loss[loss=0.101, beats_loss=0.01355, ecapa_loss=0.0001857, whisper_loss=0.08556, over 21491.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01117, ecapa_loss=0.0001903, whisper_loss=0.09449, over 3889103.43 frames. ], batch size: 87, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:33:46,580 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 20:33:59,391 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 20:34:15,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.26 vs. limit=15.0 2024-08-11 20:34:19,431 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 20:34:20,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.38 vs. limit=15.0 2024-08-11 20:34:23,483 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 20:34:25,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1276580.0, ans=0.125 2024-08-11 20:34:28,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1276580.0, ans=0.07 2024-08-11 20:34:45,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1276680.0, ans=0.1 2024-08-11 20:34:47,044 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 20:34:52,916 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 20:35:00,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1276780.0, ans=0.125 2024-08-11 20:35:13,196 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 20:35:14,994 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11750, loss[loss=0.1151, beats_loss=0.01066, ecapa_loss=0.0002197, whisper_loss=0.1022, over 21985.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01129, ecapa_loss=0.00019, whisper_loss=0.09426, over 3903775.40 frames. ], batch size: 89, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:35:16,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1276880.0, ans=0.1 2024-08-11 20:35:18,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1276880.0, ans=0.125 2024-08-11 20:35:44,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1276980.0, ans=0.125 2024-08-11 20:35:49,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.651e+01 2.904e+01 3.391e+01 1.042e+02, threshold=5.808e+01, percent-clipped=2.0 2024-08-11 20:36:07,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1277180.0, ans=0.125 2024-08-11 20:36:29,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1277280.0, ans=0.0 2024-08-11 20:36:43,555 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11800, loss[loss=0.07258, beats_loss=0.01554, ecapa_loss=0.0001363, whisper_loss=0.05568, over 20716.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01131, ecapa_loss=0.0001898, whisper_loss=0.09364, over 3911153.79 frames. ], batch size: 81, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:36:47,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1277380.0, ans=0.125 2024-08-11 20:36:48,718 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 27 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 20:37:01,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1277480.0, ans=0.0 2024-08-11 20:37:10,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1277480.0, ans=0.0 2024-08-11 20:37:35,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1277680.0, ans=0.0 2024-08-11 20:37:37,168 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 20:37:40,082 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2024-08-11 20:37:48,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1277680.0, ans=0.125 2024-08-11 20:37:58,418 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 20:38:12,016 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11850, loss[loss=0.09799, beats_loss=0.01311, ecapa_loss=0.0002028, whisper_loss=0.08285, over 21665.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01137, ecapa_loss=0.0001895, whisper_loss=0.09336, over 3887721.66 frames. ], batch size: 92, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:38:18,695 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 20:38:33,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1277980.0, ans=0.0 2024-08-11 20:38:40,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-08-11 20:38:43,356 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.625e+01 2.967e+01 3.340e+01 5.309e+01, threshold=5.933e+01, percent-clipped=0.0 2024-08-11 20:38:54,413 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 20:38:59,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1278080.0, ans=0.125 2024-08-11 20:39:08,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1278180.0, ans=0.0 2024-08-11 20:39:19,057 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 11 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 20:39:21,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1278280.0, ans=0.0 2024-08-11 20:39:27,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1278280.0, ans=0.0 2024-08-11 20:39:38,131 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11900, loss[loss=0.111, beats_loss=0.01158, ecapa_loss=0.0001585, whisper_loss=0.0978, over 19191.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01135, ecapa_loss=0.0001899, whisper_loss=0.09348, over 3908051.95 frames. ], batch size: 71, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:39:40,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1278380.0, ans=0.0 2024-08-11 20:39:59,398 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 20:40:28,980 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 20:40:39,688 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 20:40:42,668 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 20:41:00,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1278780.0, ans=0.125 2024-08-11 20:41:03,512 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 11950, loss[loss=0.09808, beats_loss=0.01257, ecapa_loss=0.000219, whisper_loss=0.08333, over 12712.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01133, ecapa_loss=0.0001908, whisper_loss=0.09283, over 3874467.59 frames. ], batch size: 54, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:41:03,685 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 20:41:12,205 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 20:41:19,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1278980.0, ans=0.2 2024-08-11 20:41:28,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1278980.0, ans=0.2 2024-08-11 20:41:33,695 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 45 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 20:41:37,066 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.570e+01 2.836e+01 3.237e+01 6.228e+01, threshold=5.672e+01, percent-clipped=0.0 2024-08-11 20:41:43,118 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 20:41:51,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1279080.0, ans=0.1 2024-08-11 20:41:54,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2024-08-11 20:42:03,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1279180.0, ans=0.125 2024-08-11 20:42:28,234 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 20:42:33,182 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12000, loss[loss=0.1256, beats_loss=0.009374, ecapa_loss=0.0001629, whisper_loss=0.1146, over 23515.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01116, ecapa_loss=0.0001915, whisper_loss=0.09361, over 3842195.78 frames. ], batch size: 88, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:42:33,182 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 20:42:54,798 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8206, 3.6820, 4.5086, 4.3184], device='cuda:1') 2024-08-11 20:43:16,176 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on ASR_libri: loss=0.2562, beats_loss=0, ecapa_loss=0.0006123, whisper_loss=0.25, over 922467.00 frames. 2024-08-11 20:43:35,170 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on SV_voxceleb1: loss=0.005094, beats_loss=0, ecapa_loss=0.0005094, whisper_loss=0, over 939242.00 frames. 2024-08-11 20:45:30,564 INFO [train_multi_KD3.py:1149] (1/4) Epoch 9, validation on AT_audioset: loss=0.02487, beats_loss=0.02487, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 20:45:30,568 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 20:45:34,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1279380.0, ans=0.0 2024-08-11 20:45:51,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1279480.0, ans=0.125 2024-08-11 20:45:59,076 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 20:46:06,964 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 20:46:21,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1279680.0, ans=0.0 2024-08-11 20:46:37,468 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.80 vs. limit=6.0 2024-08-11 20:46:54,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1279780.0, ans=0.1 2024-08-11 20:46:59,076 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12050, loss[loss=0.08273, beats_loss=0.01174, ecapa_loss=0.000185, whisper_loss=0.06914, over 15717.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01118, ecapa_loss=0.0001913, whisper_loss=0.0929, over 3805345.34 frames. ], batch size: 66, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:47:14,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1279980.0, ans=0.125 2024-08-11 20:47:24,195 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 20:47:32,931 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.178e+01 2.712e+01 3.113e+01 3.609e+01 6.588e+01, threshold=6.227e+01, percent-clipped=3.0 2024-08-11 20:47:42,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-08-11 20:48:00,967 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2024-08-11 20:48:04,859 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 20:48:17,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1280280.0, ans=0.04949747468305833 2024-08-11 20:48:27,217 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12100, loss[loss=0.1008, beats_loss=0.01197, ecapa_loss=0.0001796, whisper_loss=0.08707, over 19627.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01123, ecapa_loss=0.0001899, whisper_loss=0.09258, over 3830164.71 frames. ], batch size: 74, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:48:49,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1280480.0, ans=0.07 2024-08-11 20:48:51,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1280480.0, ans=0.125 2024-08-11 20:49:24,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1280680.0, ans=0.2 2024-08-11 20:49:31,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1280680.0, ans=0.0 2024-08-11 20:49:33,484 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.03 vs. limit=22.5 2024-08-11 20:49:36,102 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 20:49:41,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1280780.0, ans=0.125 2024-08-11 20:49:45,071 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 20:49:54,998 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12150, loss[loss=0.08334, beats_loss=0.01221, ecapa_loss=0.0001949, whisper_loss=0.06919, over 19161.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01115, ecapa_loss=0.0001901, whisper_loss=0.09318, over 3828857.68 frames. ], batch size: 78, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:49:57,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1280880.0, ans=0.0 2024-08-11 20:50:00,568 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 20:50:04,871 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 20:50:12,174 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 13 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 20:50:26,562 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.646e+01 3.020e+01 3.424e+01 5.278e+01, threshold=6.041e+01, percent-clipped=0.0 2024-08-11 20:50:48,278 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.17 vs. limit=15.0 2024-08-11 20:50:50,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1281180.0, ans=0.05 2024-08-11 20:51:02,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1281280.0, ans=0.125 2024-08-11 20:51:09,023 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-11 20:51:16,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1281280.0, ans=0.125 2024-08-11 20:51:19,629 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12200, loss[loss=0.1009, beats_loss=0.0101, ecapa_loss=0.0001677, whisper_loss=0.08915, over 17106.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01105, ecapa_loss=0.000192, whisper_loss=0.09341, over 3829914.61 frames. ], batch size: 67, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:51:19,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1281380.0, ans=0.015 2024-08-11 20:51:20,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1281380.0, ans=0.1 2024-08-11 20:51:32,726 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 20:51:38,865 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-11 20:51:43,069 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-11 20:51:54,106 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 20:52:22,842 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 20:52:26,242 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-11 20:52:36,868 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2024-08-11 20:52:39,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1281780.0, ans=0.125 2024-08-11 20:52:42,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1281880.0, ans=0.125 2024-08-11 20:52:43,698 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12250, loss[loss=0.1176, beats_loss=0.01013, ecapa_loss=0.0002118, whisper_loss=0.1053, over 22721.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01104, ecapa_loss=0.000191, whisper_loss=0.09369, over 3846494.79 frames. ], batch size: 92, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:52:46,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1281880.0, ans=0.125 2024-08-11 20:52:52,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1281880.0, ans=0.0 2024-08-11 20:52:53,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1281880.0, ans=0.1 2024-08-11 20:53:16,094 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.045e+01 2.578e+01 2.932e+01 3.420e+01 1.649e+02, threshold=5.864e+01, percent-clipped=1.0 2024-08-11 20:53:31,331 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 20:53:31,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1282080.0, ans=0.125 2024-08-11 20:53:57,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1282280.0, ans=0.0 2024-08-11 20:53:58,851 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-11 20:54:08,716 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12300, loss[loss=0.1017, beats_loss=0.009087, ecapa_loss=0.0002297, whisper_loss=0.09028, over 22038.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01102, ecapa_loss=0.0001921, whisper_loss=0.09359, over 3825466.29 frames. ], batch size: 90, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:54:21,239 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.88 vs. limit=22.5 2024-08-11 20:54:26,481 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 20:54:54,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1282580.0, ans=0.5 2024-08-11 20:55:11,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1282680.0, ans=0.0 2024-08-11 20:55:19,046 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=15.0 2024-08-11 20:55:25,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1282780.0, ans=0.5 2024-08-11 20:55:26,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1282780.0, ans=0.0 2024-08-11 20:55:35,050 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12350, loss[loss=0.09306, beats_loss=0.01188, ecapa_loss=0.0001545, whisper_loss=0.07964, over 18349.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01107, ecapa_loss=0.0001936, whisper_loss=0.09314, over 3827435.26 frames. ], batch size: 71, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:55:47,759 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-11 20:56:06,580 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.524e+01 2.954e+01 3.299e+01 5.655e+01, threshold=5.908e+01, percent-clipped=0.0 2024-08-11 20:56:17,291 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 20:56:17,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1283080.0, ans=10.0 2024-08-11 20:56:21,637 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 14 from LS+wenet, 23 from Vox, 50 fro AS 2024-08-11 20:56:22,321 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.90 vs. limit=6.0 2024-08-11 20:56:42,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1283280.0, ans=0.0 2024-08-11 20:57:01,289 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12400, loss[loss=0.07909, beats_loss=0.01168, ecapa_loss=0.000204, whisper_loss=0.06537, over 16817.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01102, ecapa_loss=0.0001918, whisper_loss=0.09324, over 3817855.96 frames. ], batch size: 68, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:57:09,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1283380.0, ans=0.125 2024-08-11 20:57:40,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1283580.0, ans=0.0 2024-08-11 20:58:23,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1283780.0, ans=0.1 2024-08-11 20:58:25,686 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12450, loss[loss=0.1128, beats_loss=0.01033, ecapa_loss=0.0001739, whisper_loss=0.1007, over 16126.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01107, ecapa_loss=0.0001921, whisper_loss=0.09232, over 3860556.46 frames. ], batch size: 61, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:58:34,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2024-08-11 20:58:41,657 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 20:58:56,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.629e+01 2.973e+01 3.425e+01 5.618e+01, threshold=5.946e+01, percent-clipped=0.0 2024-08-11 20:58:57,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1284080.0, ans=0.0 2024-08-11 20:59:11,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1284080.0, ans=0.125 2024-08-11 20:59:19,913 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-11 20:59:21,433 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 20:59:28,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1284180.0, ans=0.0 2024-08-11 20:59:29,602 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 20:59:48,210 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12500, loss[loss=0.09523, beats_loss=0.01257, ecapa_loss=0.0001965, whisper_loss=0.08069, over 21782.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01121, ecapa_loss=0.0001905, whisper_loss=0.09157, over 3888703.57 frames. ], batch size: 91, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:59:53,695 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 21:00:06,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1284480.0, ans=0.125 2024-08-11 21:00:09,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1284480.0, ans=0.1 2024-08-11 21:00:13,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1284480.0, ans=0.0 2024-08-11 21:00:16,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1284480.0, ans=0.125 2024-08-11 21:00:27,254 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 21:00:33,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1284580.0, ans=0.1 2024-08-11 21:00:52,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1284680.0, ans=0.0 2024-08-11 21:00:52,927 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 21:01:14,445 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12550, loss[loss=0.08504, beats_loss=0.01307, ecapa_loss=0.0001926, whisper_loss=0.07004, over 16200.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01113, ecapa_loss=0.0001912, whisper_loss=0.09197, over 3884303.01 frames. ], batch size: 68, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:01:22,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1284880.0, ans=0.125 2024-08-11 21:01:29,893 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 11 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 21:01:44,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.709e+01 3.086e+01 3.503e+01 6.566e+01, threshold=6.173e+01, percent-clipped=1.0 2024-08-11 21:01:45,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1285080.0, ans=0.125 2024-08-11 21:02:00,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1285080.0, ans=0.125 2024-08-11 21:02:35,896 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12600, loss[loss=0.09558, beats_loss=0.01247, ecapa_loss=0.0001494, whisper_loss=0.08161, over 20220.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01116, ecapa_loss=0.0001907, whisper_loss=0.09176, over 3873838.44 frames. ], batch size: 80, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:02:41,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1285380.0, ans=0.125 2024-08-11 21:02:50,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1285480.0, ans=0.0 2024-08-11 21:02:57,968 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 21:03:04,708 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 21:03:28,400 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=15.0 2024-08-11 21:03:57,182 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12650, loss[loss=0.1028, beats_loss=0.01259, ecapa_loss=0.0001796, whisper_loss=0.08841, over 17572.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01118, ecapa_loss=0.0001912, whisper_loss=0.09203, over 3875301.64 frames. ], batch size: 72, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:03:59,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1285880.0, ans=0.125 2024-08-11 21:04:11,909 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2024-08-11 21:04:17,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1285980.0, ans=0.1 2024-08-11 21:04:22,834 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-11 21:04:25,416 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-11 21:04:31,296 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.568e+01 2.843e+01 3.370e+01 6.340e+01, threshold=5.685e+01, percent-clipped=1.0 2024-08-11 21:05:05,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1286180.0, ans=0.0 2024-08-11 21:05:06,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1286280.0, ans=0.125 2024-08-11 21:05:25,574 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12700, loss[loss=0.1302, beats_loss=0.006696, ecapa_loss=0.0002151, whisper_loss=0.1214, over 16425.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01121, ecapa_loss=0.0001912, whisper_loss=0.09207, over 3848774.89 frames. ], batch size: 64, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:05:28,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1286380.0, ans=0.125 2024-08-11 21:05:32,125 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 12 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 21:05:32,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1286380.0, ans=0.07 2024-08-11 21:05:35,777 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-11 21:05:37,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1286380.0, ans=0.125 2024-08-11 21:05:50,212 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 21:05:56,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1286580.0, ans=0.125 2024-08-11 21:06:09,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1286580.0, ans=0.125 2024-08-11 21:06:11,211 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-08-11 21:06:19,576 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2024-08-11 21:06:26,570 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2024-08-11 21:06:33,341 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2024-08-11 21:06:47,136 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12750, loss[loss=0.1037, beats_loss=0.01136, ecapa_loss=0.0002222, whisper_loss=0.09013, over 20136.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01129, ecapa_loss=0.0001914, whisper_loss=0.09227, over 3871964.00 frames. ], batch size: 84, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:06:58,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1286880.0, ans=0.1 2024-08-11 21:07:03,936 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 21:07:05,513 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 21:07:08,649 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 21:07:16,200 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-11 21:07:16,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1286980.0, ans=0.2 2024-08-11 21:07:19,456 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.648e+01 3.001e+01 3.436e+01 1.023e+02, threshold=6.002e+01, percent-clipped=1.0 2024-08-11 21:07:25,063 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 21:07:38,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1287180.0, ans=0.125 2024-08-11 21:08:16,009 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12800, loss[loss=0.1172, beats_loss=0.01132, ecapa_loss=0.0001835, whisper_loss=0.1041, over 22483.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01122, ecapa_loss=0.0001925, whisper_loss=0.09302, over 3866978.49 frames. ], batch size: 90, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:08:38,251 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 21:09:07,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1287680.0, ans=0.125 2024-08-11 21:09:07,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1287680.0, ans=0.07 2024-08-11 21:09:13,242 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 21:09:25,578 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 21:09:27,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1287780.0, ans=0.125 2024-08-11 21:09:31,383 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 21:09:33,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1287780.0, ans=0.125 2024-08-11 21:09:36,604 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12850, loss[loss=0.1241, beats_loss=0.008111, ecapa_loss=0.0001991, whisper_loss=0.114, over 16341.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01124, ecapa_loss=0.0001917, whisper_loss=0.09221, over 3855891.99 frames. ], batch size: 63, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:09:41,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1287880.0, ans=0.125 2024-08-11 21:09:44,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1287880.0, ans=0.125 2024-08-11 21:09:55,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1287980.0, ans=0.0 2024-08-11 21:10:09,800 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.541e+01 2.885e+01 3.297e+01 4.788e+01, threshold=5.770e+01, percent-clipped=0.0 2024-08-11 21:10:49,089 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 33 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 21:10:53,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1288280.0, ans=0.0 2024-08-11 21:10:57,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1288280.0, ans=0.0 2024-08-11 21:10:58,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1288380.0, ans=0.125 2024-08-11 21:11:00,403 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12900, loss[loss=0.1164, beats_loss=0.01096, ecapa_loss=0.0001983, whisper_loss=0.1034, over 19089.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01126, ecapa_loss=0.0001923, whisper_loss=0.09252, over 3888761.14 frames. ], batch size: 78, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:11:04,523 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-08-11 21:11:13,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1288380.0, ans=0.125 2024-08-11 21:11:30,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1288580.0, ans=0.0 2024-08-11 21:11:47,141 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-08-11 21:12:11,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1288780.0, ans=0.0 2024-08-11 21:12:19,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1288780.0, ans=0.2 2024-08-11 21:12:21,625 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 12950, loss[loss=0.09913, beats_loss=0.01378, ecapa_loss=0.000171, whisper_loss=0.08364, over 23458.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0113, ecapa_loss=0.000191, whisper_loss=0.09255, over 3905197.57 frames. ], batch size: 95, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:12:27,331 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 21:12:40,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1288980.0, ans=0.125 2024-08-11 21:12:44,470 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-08-11 21:12:54,624 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.715e+01 3.125e+01 3.606e+01 5.827e+01, threshold=6.249e+01, percent-clipped=1.0 2024-08-11 21:13:12,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1289180.0, ans=0.07 2024-08-11 21:13:20,385 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.93 vs. limit=12.0 2024-08-11 21:13:42,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1289280.0, ans=0.0 2024-08-11 21:13:45,781 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13000, loss[loss=0.08134, beats_loss=0.0123, ecapa_loss=0.0002136, whisper_loss=0.0669, over 13523.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01133, ecapa_loss=0.0001908, whisper_loss=0.0919, over 3889131.13 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 2.305843009213694e+18 2024-08-11 21:14:07,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1289480.0, ans=0.0 2024-08-11 21:14:09,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1289480.0, ans=0.125 2024-08-11 21:14:16,743 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 21:14:17,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1289480.0, ans=0.125 2024-08-11 21:14:22,653 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 28 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 21:14:23,108 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=12.0 2024-08-11 21:14:25,967 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=15.0 2024-08-11 21:14:26,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1289580.0, ans=0.0 2024-08-11 21:15:12,089 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13050, loss[loss=0.1094, beats_loss=0.01316, ecapa_loss=0.0002049, whisper_loss=0.09415, over 22024.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0113, ecapa_loss=0.0001889, whisper_loss=0.09255, over 3887007.57 frames. ], batch size: 93, lr: 6.86e-03, grad_scale: 2.305843009213694e+18 2024-08-11 21:15:23,406 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 21:15:25,810 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2024-08-11 21:15:26,441 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-11 21:15:30,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1289980.0, ans=0.09899494936611666 2024-08-11 21:15:37,695 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-08-11 21:15:41,168 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 21:15:43,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.491e+01 2.763e+01 3.152e+01 5.442e+01, threshold=5.527e+01, percent-clipped=0.0 2024-08-11 21:15:52,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1290080.0, ans=0.0 2024-08-11 21:15:57,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1290080.0, ans=0.1 2024-08-11 21:16:10,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1290180.0, ans=0.2 2024-08-11 21:16:11,925 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.116e+01 2024-08-11 21:16:20,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1290280.0, ans=0.0 2024-08-11 21:16:34,751 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13100, loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.000186, whisper_loss=0.08985, over 17448.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01128, ecapa_loss=0.0001895, whisper_loss=0.09204, over 3867680.59 frames. ], batch size: 68, lr: 6.86e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:17:10,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1290580.0, ans=0.125 2024-08-11 21:17:10,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1290580.0, ans=0.0 2024-08-11 21:17:17,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1290580.0, ans=0.0 2024-08-11 21:17:27,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1290680.0, ans=0.0 2024-08-11 21:18:02,067 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13150, loss[loss=0.1248, beats_loss=0.01058, ecapa_loss=0.0001788, whisper_loss=0.1124, over 22853.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01133, ecapa_loss=0.0001885, whisper_loss=0.09205, over 3902627.39 frames. ], batch size: 91, lr: 6.86e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:18:06,332 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2024-08-11 21:18:07,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1290880.0, ans=0.125 2024-08-11 21:18:07,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1290880.0, ans=0.125 2024-08-11 21:18:24,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1290980.0, ans=0.1 2024-08-11 21:18:36,007 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.521e+01 2.887e+01 3.350e+01 6.017e+01, threshold=5.775e+01, percent-clipped=1.0 2024-08-11 21:18:36,363 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 33 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 21:18:49,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1291080.0, ans=0.125 2024-08-11 21:18:52,766 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-08-11 21:18:59,386 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2024-08-11 21:19:01,538 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 21:19:10,336 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2024-08-11 21:19:11,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1291280.0, ans=0.125 2024-08-11 21:19:14,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1291280.0, ans=0.1 2024-08-11 21:19:24,858 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13200, loss[loss=0.08915, beats_loss=0.01333, ecapa_loss=0.0001578, whisper_loss=0.07425, over 20041.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01134, ecapa_loss=0.0001877, whisper_loss=0.09247, over 3903336.36 frames. ], batch size: 80, lr: 6.86e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:19:32,201 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 21:20:06,299 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-11 21:20:20,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1291680.0, ans=0.1 2024-08-11 21:20:24,051 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=15.0 2024-08-11 21:20:28,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1291680.0, ans=0.05 2024-08-11 21:20:48,412 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13250, loss[loss=0.129, beats_loss=0.009904, ecapa_loss=0.0002023, whisper_loss=0.117, over 17671.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01122, ecapa_loss=0.000188, whisper_loss=0.09378, over 3906420.46 frames. ], batch size: 68, lr: 6.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:21:02,857 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 21:21:03,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1291980.0, ans=0.125 2024-08-11 21:21:15,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1291980.0, ans=0.2 2024-08-11 21:21:21,286 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.554e+01 3.002e+01 3.444e+01 4.623e+01, threshold=6.004e+01, percent-clipped=0.0 2024-08-11 21:21:23,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1292080.0, ans=0.125 2024-08-11 21:21:46,514 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 21:21:48,230 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-11 21:22:05,728 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13300, loss[loss=0.09742, beats_loss=0.01327, ecapa_loss=0.0001499, whisper_loss=0.08265, over 17841.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01119, ecapa_loss=0.0001886, whisper_loss=0.09343, over 3873031.88 frames. ], batch size: 68, lr: 6.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:22:14,604 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 21:22:20,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1292380.0, ans=0.125 2024-08-11 21:22:38,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1292580.0, ans=0.05 2024-08-11 21:23:09,521 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 13 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 21:23:22,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1292780.0, ans=0.125 2024-08-11 21:23:25,064 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13350, loss[loss=0.1236, beats_loss=0.01208, ecapa_loss=0.0001587, whisper_loss=0.1099, over 23047.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01121, ecapa_loss=0.0001885, whisper_loss=0.09339, over 3899291.81 frames. ], batch size: 89, lr: 6.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:23:35,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1292880.0, ans=0.0 2024-08-11 21:23:43,364 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 21:23:44,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1292980.0, ans=0.125 2024-08-11 21:23:50,294 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 21:23:55,593 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.575e+01 2.972e+01 3.296e+01 7.873e+01, threshold=5.944e+01, percent-clipped=3.0 2024-08-11 21:24:03,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1293080.0, ans=0.1 2024-08-11 21:24:04,903 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 21:24:22,257 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-11 21:24:37,111 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13400, loss[loss=0.1056, beats_loss=0.01044, ecapa_loss=0.0001821, whisper_loss=0.0933, over 21441.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01126, ecapa_loss=0.000188, whisper_loss=0.09274, over 3879207.67 frames. ], batch size: 83, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:24:59,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1293480.0, ans=0.125 2024-08-11 21:25:03,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1293580.0, ans=0.2 2024-08-11 21:25:06,631 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 21:25:19,927 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 19 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 21:25:21,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1293680.0, ans=0.1 2024-08-11 21:25:23,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1293680.0, ans=0.0 2024-08-11 21:25:24,998 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.80 vs. limit=22.5 2024-08-11 21:25:30,719 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=15.0 2024-08-11 21:25:32,844 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 21:25:35,233 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 21:25:38,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1293780.0, ans=0.0 2024-08-11 21:25:46,669 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13450, loss[loss=0.109, beats_loss=0.01427, ecapa_loss=0.0002044, whisper_loss=0.09264, over 22375.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01115, ecapa_loss=0.0001893, whisper_loss=0.09332, over 3904780.28 frames. ], batch size: 90, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:25:52,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1293880.0, ans=0.0 2024-08-11 21:25:57,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1293880.0, ans=0.125 2024-08-11 21:26:15,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.605e+01 2.918e+01 3.272e+01 4.452e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-11 21:26:55,026 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13500, loss[loss=0.09974, beats_loss=0.009683, ecapa_loss=0.0001938, whisper_loss=0.08812, over 17888.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01113, ecapa_loss=0.0001902, whisper_loss=0.09302, over 3881031.98 frames. ], batch size: 70, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:26:59,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1294380.0, ans=0.1 2024-08-11 21:27:03,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1294380.0, ans=0.0 2024-08-11 21:27:19,607 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=12.0 2024-08-11 21:27:25,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1294580.0, ans=0.125 2024-08-11 21:27:25,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1294580.0, ans=0.09899494936611666 2024-08-11 21:27:31,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1294580.0, ans=0.125 2024-08-11 21:27:39,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1294680.0, ans=10.0 2024-08-11 21:27:48,893 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 21:28:03,229 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13550, loss[loss=0.1082, beats_loss=0.01283, ecapa_loss=0.0001683, whisper_loss=0.09368, over 23593.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01113, ecapa_loss=0.0001909, whisper_loss=0.09313, over 3875506.94 frames. ], batch size: 94, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:28:07,800 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 15 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 21:28:12,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1294880.0, ans=0.2 2024-08-11 21:28:12,681 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=12.0 2024-08-11 21:28:14,958 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 21:28:16,284 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 21:28:19,032 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 21:28:23,904 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2024-08-11 21:28:32,378 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.573e+01 2.919e+01 3.325e+01 1.633e+02, threshold=5.839e+01, percent-clipped=1.0 2024-08-11 21:28:32,613 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 21:28:38,178 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 21:28:46,589 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2024-08-11 21:28:47,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1295180.0, ans=0.0 2024-08-11 21:28:49,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1295180.0, ans=0.0 2024-08-11 21:29:01,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1295280.0, ans=0.125 2024-08-11 21:29:12,023 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13600, loss[loss=0.09106, beats_loss=0.01154, ecapa_loss=0.000214, whisper_loss=0.07738, over 17832.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01116, ecapa_loss=0.0001904, whisper_loss=0.09257, over 3867931.61 frames. ], batch size: 75, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:29:12,236 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-11 21:29:25,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1295480.0, ans=0.125 2024-08-11 21:29:31,212 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 24 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-11 21:29:44,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1295580.0, ans=0.125 2024-08-11 21:29:47,766 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.493e+00 2024-08-11 21:29:57,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1295680.0, ans=0.125 2024-08-11 21:30:12,114 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-11 21:30:15,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1295780.0, ans=0.0 2024-08-11 21:30:20,346 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13650, loss[loss=0.1008, beats_loss=0.009025, ecapa_loss=0.0002127, whisper_loss=0.0897, over 14483.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0112, ecapa_loss=0.0001896, whisper_loss=0.09276, over 3879981.79 frames. ], batch size: 60, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:30:20,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1295880.0, ans=0.125 2024-08-11 21:30:40,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1295980.0, ans=0.125 2024-08-11 21:30:41,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1295980.0, ans=0.125 2024-08-11 21:30:47,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1296080.0, ans=0.125 2024-08-11 21:30:48,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.503e+01 2.904e+01 3.318e+01 5.006e+01, threshold=5.809e+01, percent-clipped=0.0 2024-08-11 21:30:50,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1296080.0, ans=0.015 2024-08-11 21:30:54,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1296080.0, ans=0.1 2024-08-11 21:30:54,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1296080.0, ans=0.02 2024-08-11 21:30:56,045 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 21:30:56,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1296080.0, ans=0.04949747468305833 2024-08-11 21:31:20,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1296280.0, ans=15.0 2024-08-11 21:31:28,577 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13700, loss[loss=0.09316, beats_loss=0.009786, ecapa_loss=0.0001667, whisper_loss=0.08171, over 18885.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01122, ecapa_loss=0.00019, whisper_loss=0.09299, over 3887274.65 frames. ], batch size: 77, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:31:43,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1296480.0, ans=0.0 2024-08-11 21:31:44,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1296480.0, ans=0.125 2024-08-11 21:31:44,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1296480.0, ans=0.125 2024-08-11 21:31:48,639 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 30 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-11 21:31:54,151 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 21:32:28,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1296780.0, ans=0.0 2024-08-11 21:32:30,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1296780.0, ans=0.2 2024-08-11 21:32:38,437 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13750, loss[loss=0.09136, beats_loss=0.01447, ecapa_loss=0.0001754, whisper_loss=0.07514, over 13676.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01123, ecapa_loss=0.0001891, whisper_loss=0.09309, over 3859246.05 frames. ], batch size: 54, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:33:01,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1296980.0, ans=0.125 2024-08-11 21:33:07,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.561e+01 2.855e+01 3.257e+01 5.078e+01, threshold=5.711e+01, percent-clipped=0.0 2024-08-11 21:33:15,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1297080.0, ans=0.125 2024-08-11 21:33:48,645 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13800, loss[loss=0.09352, beats_loss=0.01413, ecapa_loss=0.0001394, whisper_loss=0.07799, over 23343.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01123, ecapa_loss=0.0001893, whisper_loss=0.09269, over 3851425.81 frames. ], batch size: 91, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:34:12,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1297480.0, ans=0.125 2024-08-11 21:34:22,665 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.599e-03 2024-08-11 21:34:30,332 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 37 from Vox, 28 fro AS 2024-08-11 21:34:37,277 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 21:34:38,591 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 21:34:43,561 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.30 vs. limit=10.0 2024-08-11 21:34:48,636 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2024-08-11 21:34:57,469 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13850, loss[loss=0.09904, beats_loss=0.01379, ecapa_loss=0.0001907, whisper_loss=0.08333, over 21084.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01119, ecapa_loss=0.0001887, whisper_loss=0.09312, over 3869984.74 frames. ], batch size: 87, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:35:03,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1297880.0, ans=0.1 2024-08-11 21:35:17,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1297980.0, ans=0.04949747468305833 2024-08-11 21:35:26,187 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.750e+01 3.088e+01 3.546e+01 6.102e+01, threshold=6.176e+01, percent-clipped=2.0 2024-08-11 21:35:29,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1298080.0, ans=0.125 2024-08-11 21:35:33,555 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-08-11 21:35:34,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1298080.0, ans=0.0 2024-08-11 21:35:42,402 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 21:35:43,922 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 35 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 21:35:50,743 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-11 21:35:54,911 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 21:35:55,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1298280.0, ans=0.125 2024-08-11 21:36:05,930 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13900, loss[loss=0.09489, beats_loss=0.01343, ecapa_loss=0.0001784, whisper_loss=0.07968, over 19017.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01118, ecapa_loss=0.0001877, whisper_loss=0.09361, over 3905981.92 frames. ], batch size: 80, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:36:14,314 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 21:36:29,260 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-11 21:36:38,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1298580.0, ans=0.0 2024-08-11 21:37:03,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1298780.0, ans=0.125 2024-08-11 21:37:07,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1298780.0, ans=0.125 2024-08-11 21:37:13,896 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 21:37:14,925 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 13950, loss[loss=0.1034, beats_loss=0.01068, ecapa_loss=0.0001883, whisper_loss=0.09081, over 22366.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01116, ecapa_loss=0.0001882, whisper_loss=0.09412, over 3919268.93 frames. ], batch size: 91, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:37:20,608 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-11 21:37:23,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=1298880.0, ans=0.2 2024-08-11 21:37:43,184 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.710e+01 3.048e+01 3.326e+01 4.854e+01, threshold=6.095e+01, percent-clipped=0.0 2024-08-11 21:37:45,687 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.83 vs. limit=15.0 2024-08-11 21:38:01,376 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-11 21:38:05,078 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=12.0 2024-08-11 21:38:12,746 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-11 21:38:19,589 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 28 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-11 21:38:23,825 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 14000, loss[loss=0.1076, beats_loss=0.01095, ecapa_loss=0.000206, whisper_loss=0.09459, over 20221.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01118, ecapa_loss=0.0001868, whisper_loss=0.09398, over 3931938.52 frames. ], batch size: 81, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:38:25,977 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2024-08-11 21:38:26,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1299380.0, ans=0.0 2024-08-11 21:38:36,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1299480.0, ans=0.125 2024-08-11 21:38:37,370 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 21:38:53,711 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 21:38:55,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1299580.0, ans=0.125 2024-08-11 21:38:55,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1299580.0, ans=0.125 2024-08-11 21:39:18,883 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-11 21:39:32,773 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 14050, loss[loss=0.1298, beats_loss=0.009107, ecapa_loss=0.0001679, whisper_loss=0.119, over 24273.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01119, ecapa_loss=0.0001854, whisper_loss=0.09388, over 3938387.10 frames. ], batch size: 90, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:39:37,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-08-11 21:39:48,900 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 21:40:01,049 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.630e+01 2.929e+01 3.311e+01 9.104e+01, threshold=5.859e+01, percent-clipped=1.0 2024-08-11 21:40:28,502 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.20 vs. limit=15.0 2024-08-11 21:40:31,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1300280.0, ans=0.125 2024-08-11 21:40:41,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 14100, loss[loss=0.1042, beats_loss=0.009186, ecapa_loss=0.0001779, whisper_loss=0.09319, over 14749.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01119, ecapa_loss=0.0001854, whisper_loss=0.09284, over 3899769.77 frames. ], batch size: 56, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:40:48,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1300380.0, ans=0.125 2024-08-11 21:40:55,102 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=12.0 2024-08-11 21:40:57,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1300480.0, ans=0.125 2024-08-11 21:41:00,808 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 21:41:08,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1300580.0, ans=0.125 2024-08-11 21:41:14,829 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 21:41:20,360 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 21:41:27,139 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 21:41:27,849 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=15.0 2024-08-11 21:41:50,717 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 14150, loss[loss=0.09489, beats_loss=0.01157, ecapa_loss=0.0001857, whisper_loss=0.08146, over 22430.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01121, ecapa_loss=0.0001866, whisper_loss=0.09263, over 3912137.15 frames. ], batch size: 91, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:42:18,890 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.563e-03 2024-08-11 21:42:19,643 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.615e+01 2.850e+01 3.033e+01 5.082e+01, threshold=5.700e+01, percent-clipped=0.0 2024-08-11 21:42:30,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=12.0 2024-08-11 21:42:47,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1301280.0, ans=0.0 2024-08-11 21:42:47,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1301280.0, ans=0.0 2024-08-11 21:42:55,563 INFO [train_multi_KD3.py:844] (1/4) A total of 97 cuts. 27 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-11 21:42:59,572 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 14200, loss[loss=0.1188, beats_loss=0.01006, ecapa_loss=0.000173, whisper_loss=0.1071, over 20665.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01125, ecapa_loss=0.0001865, whisper_loss=0.09179, over 3884194.59 frames. ], batch size: 81, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:43:11,599 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 21:43:26,516 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 21:43:36,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1301580.0, ans=0.125 2024-08-11 21:43:43,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1301680.0, ans=0.1 2024-08-11 21:44:02,092 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.33 vs. limit=6.0 2024-08-11 21:44:08,079 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 14250, loss[loss=0.09991, beats_loss=0.01041, ecapa_loss=0.0002677, whisper_loss=0.08682, over 15726.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01115, ecapa_loss=0.0001874, whisper_loss=0.09216, over 3892569.08 frames. ], batch size: 66, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:44:08,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1301880.0, ans=0.1 2024-08-11 21:44:09,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1301880.0, ans=0.2 2024-08-11 21:44:12,691 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.24 vs. limit=10.0 2024-08-11 21:44:21,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1301980.0, ans=0.125 2024-08-11 21:44:36,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1302080.0, ans=0.125 2024-08-11 21:44:37,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.747e+01 3.033e+01 3.629e+01 5.919e+01, threshold=6.067e+01, percent-clipped=2.0 2024-08-11 21:44:41,492 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 21:44:54,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1302180.0, ans=0.125 2024-08-11 21:45:01,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1302180.0, ans=0.125 2024-08-11 21:45:04,669 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 21:45:07,487 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 21:45:14,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1302280.0, ans=0.0 2024-08-11 21:45:14,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.05 vs. limit=12.0 2024-08-11 21:45:18,001 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 14300, loss[loss=0.1137, beats_loss=0.0116, ecapa_loss=0.0001769, whisper_loss=0.1003, over 22662.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01117, ecapa_loss=0.0001876, whisper_loss=0.09222, over 3911222.15 frames. ], batch size: 91, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:45:21,551 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 21:45:22,086 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2024-08-11 21:45:30,707 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.64 vs. limit=15.0 2024-08-11 21:45:36,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1302480.0, ans=0.2 2024-08-11 21:45:46,831 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2024-08-11 21:46:13,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1302780.0, ans=0.1 2024-08-11 21:46:27,412 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 14350, loss[loss=0.09434, beats_loss=0.01333, ecapa_loss=0.0001702, whisper_loss=0.07931, over 22508.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01114, ecapa_loss=0.0001877, whisper_loss=0.09231, over 3923933.19 frames. ], batch size: 89, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:46:34,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1302880.0, ans=0.2 2024-08-11 21:46:43,008 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 21:46:48,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1302980.0, ans=0.1 2024-08-11 21:46:52,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1302980.0, ans=0.125 2024-08-11 21:46:56,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.719e+01 2.981e+01 3.464e+01 5.321e+01, threshold=5.963e+01, percent-clipped=0.0 2024-08-11 21:47:06,284 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 21:47:07,966 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 22 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 21:47:12,161 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 21:47:13,519 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 21:47:28,695 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 21:47:29,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1303280.0, ans=0.125 2024-08-11 21:47:37,024 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 14400, loss[loss=0.09728, beats_loss=0.01233, ecapa_loss=0.0001818, whisper_loss=0.08313, over 21746.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01112, ecapa_loss=0.0001881, whisper_loss=0.09272, over 3936028.47 frames. ], batch size: 90, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:48:04,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1303580.0, ans=0.0 2024-08-11 21:48:07,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1303580.0, ans=0.0 2024-08-11 21:48:21,755 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 21:48:30,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1303680.0, ans=0.0 2024-08-11 21:48:34,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1303780.0, ans=0.2 2024-08-11 21:48:41,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1303780.0, ans=0.0 2024-08-11 21:48:45,915 INFO [train_multi_KD3.py:1116] (1/4) Epoch 9, batch 14450, loss[loss=0.1092, beats_loss=0.0102, ecapa_loss=0.0002568, whisper_loss=0.0964, over 17045.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01112, ecapa_loss=0.000189, whisper_loss=0.09253, over 3886838.94 frames. ], batch size: 71, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:48:53,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1303880.0, ans=0.125 2024-08-11 21:49:13,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.624e+01 2.900e+01 3.333e+01 5.803e+01, threshold=5.799e+01, percent-clipped=0.0 2024-08-11 21:49:23,080 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 21:50:30,442 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 0, loss[loss=0.1051, beats_loss=0.01124, ecapa_loss=0.000188, whisper_loss=0.09195, over 21347.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01124, ecapa_loss=0.000188, whisper_loss=0.09195, over 21347.00 frames. ], batch size: 81, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:50:30,442 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 21:51:13,143 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on ASR_libri: loss=0.2568, beats_loss=0, ecapa_loss=0.0006206, whisper_loss=0.2506, over 922467.00 frames. 2024-08-11 21:51:29,265 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on SV_voxceleb1: loss=0.005051, beats_loss=0, ecapa_loss=0.0005051, whisper_loss=0, over 939242.00 frames. 2024-08-11 21:53:11,569 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.4135, 4.8146, 5.2997, 5.4868], device='cuda:1') 2024-08-11 21:53:33,403 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on AT_audioset: loss=0.02495, beats_loss=0.02495, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 21:53:33,406 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 21:54:01,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1304420.0, ans=0.2 2024-08-11 21:54:32,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1304520.0, ans=0.95 2024-08-11 21:54:32,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1304520.0, ans=0.0 2024-08-11 21:55:14,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1304620.0, ans=0.1 2024-08-11 21:55:32,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1304720.0, ans=0.09899494936611666 2024-08-11 21:55:43,141 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 50, loss[loss=0.09831, beats_loss=0.009567, ecapa_loss=0.0001938, whisper_loss=0.08681, over 17763.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01048, ecapa_loss=0.0002042, whisper_loss=0.08855, over 889224.03 frames. ], batch size: 71, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:56:17,750 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 21:56:47,446 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.905e+01 3.307e+01 3.702e+01 5.786e+01, threshold=6.614e+01, percent-clipped=0.0 2024-08-11 21:56:58,347 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 21:57:33,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1305220.0, ans=0.125 2024-08-11 21:57:41,635 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 100, loss[loss=0.08107, beats_loss=0.01043, ecapa_loss=0.000191, whisper_loss=0.06873, over 13152.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01048, ecapa_loss=0.0001959, whisper_loss=0.09115, over 1535329.49 frames. ], batch size: 53, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:57:55,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1305320.0, ans=0.125 2024-08-11 21:58:19,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1305420.0, ans=0.07 2024-08-11 21:58:26,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-11 21:58:35,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1305520.0, ans=0.125 2024-08-11 21:58:43,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1305520.0, ans=0.0 2024-08-11 21:59:13,819 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.25 vs. limit=10.0 2024-08-11 21:59:29,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1305720.0, ans=0.125 2024-08-11 21:59:32,432 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 150, loss[loss=0.09788, beats_loss=0.01041, ecapa_loss=0.0002027, whisper_loss=0.08544, over 17283.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01045, ecapa_loss=0.0001945, whisper_loss=0.09219, over 2056296.61 frames. ], batch size: 69, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:59:32,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1305820.0, ans=0.0 2024-08-11 22:00:09,902 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 14 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 22:00:10,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1306020.0, ans=0.2 2024-08-11 22:00:20,115 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.798e+01 3.187e+01 3.633e+01 2.129e+02, threshold=6.375e+01, percent-clipped=1.0 2024-08-11 22:00:49,707 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 22:00:54,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1306220.0, ans=0.125 2024-08-11 22:00:56,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1306220.0, ans=0.125 2024-08-11 22:00:58,144 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 200, loss[loss=0.09787, beats_loss=0.01142, ecapa_loss=0.0001791, whisper_loss=0.08466, over 15181.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01054, ecapa_loss=0.0001919, whisper_loss=0.09281, over 2435936.99 frames. ], batch size: 63, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:01:04,637 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-11 22:01:09,157 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 22:01:10,952 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 22:01:17,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1306420.0, ans=0.07 2024-08-11 22:01:19,673 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-11 22:01:21,475 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 30 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-11 22:01:22,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1306420.0, ans=6.0 2024-08-11 22:01:22,319 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2024-08-11 22:01:33,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1306520.0, ans=0.05 2024-08-11 22:01:35,824 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 22:02:01,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1306720.0, ans=0.125 2024-08-11 22:02:05,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1306720.0, ans=0.2 2024-08-11 22:02:14,258 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 250, loss[loss=0.1009, beats_loss=0.01214, ecapa_loss=0.0001678, whisper_loss=0.08713, over 21211.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01066, ecapa_loss=0.000188, whisper_loss=0.0936, over 2732999.37 frames. ], batch size: 83, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:02:21,022 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 21 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-11 22:02:22,377 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 28 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 22:02:22,696 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:02:22,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1306820.0, ans=0.0 2024-08-11 22:02:36,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1306920.0, ans=0.0 2024-08-11 22:02:40,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1306920.0, ans=0.0 2024-08-11 22:02:57,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.454e+01 2.692e+01 3.153e+01 8.296e+01, threshold=5.384e+01, percent-clipped=2.0 2024-08-11 22:03:04,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1307120.0, ans=0.2 2024-08-11 22:03:07,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1307120.0, ans=0.0 2024-08-11 22:03:14,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1307120.0, ans=0.2 2024-08-11 22:03:22,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-08-11 22:03:31,176 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 300, loss[loss=0.1271, beats_loss=0.008774, ecapa_loss=0.000184, whisper_loss=0.1164, over 24685.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01089, ecapa_loss=0.0001858, whisper_loss=0.09194, over 2946364.84 frames. ], batch size: 92, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:03:31,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1307320.0, ans=0.125 2024-08-11 22:04:08,504 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 22:04:18,603 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 22:04:22,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1307620.0, ans=0.125 2024-08-11 22:04:29,361 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 22:04:30,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1307720.0, ans=0.125 2024-08-11 22:04:38,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1307720.0, ans=0.125 2024-08-11 22:04:38,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1307720.0, ans=0.125 2024-08-11 22:04:46,369 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 350, loss[loss=0.0936, beats_loss=0.01144, ecapa_loss=0.0001868, whisper_loss=0.08029, over 19561.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01093, ecapa_loss=0.000186, whisper_loss=0.0921, over 3159670.80 frames. ], batch size: 78, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:04:51,257 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 22:04:57,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1307820.0, ans=0.0 2024-08-11 22:05:02,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1307920.0, ans=0.0 2024-08-11 22:05:03,019 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.86 vs. limit=22.5 2024-08-11 22:05:06,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1307920.0, ans=0.125 2024-08-11 22:05:08,296 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-11 22:05:11,201 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 22:05:14,349 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 22:05:26,116 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.513e+01 2.913e+01 3.282e+01 4.748e+01, threshold=5.825e+01, percent-clipped=0.0 2024-08-11 22:05:45,963 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 22 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 22:06:01,733 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 400, loss[loss=0.1293, beats_loss=0.008765, ecapa_loss=0.0001804, whisper_loss=0.1188, over 17930.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01085, ecapa_loss=0.0001864, whisper_loss=0.09266, over 3314370.91 frames. ], batch size: 67, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:06:03,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1308320.0, ans=0.2 2024-08-11 22:06:09,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1308320.0, ans=0.0 2024-08-11 22:06:09,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1308320.0, ans=0.0 2024-08-11 22:06:17,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1308420.0, ans=0.0 2024-08-11 22:06:17,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1308420.0, ans=0.125 2024-08-11 22:06:39,293 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2024-08-11 22:06:50,661 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-11 22:06:53,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1308620.0, ans=0.0 2024-08-11 22:06:57,529 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 22:07:08,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1308720.0, ans=0.0 2024-08-11 22:07:09,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1308720.0, ans=0.2 2024-08-11 22:07:12,273 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 22:07:15,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1308720.0, ans=0.125 2024-08-11 22:07:17,924 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 450, loss[loss=0.1114, beats_loss=0.009847, ecapa_loss=0.0001927, whisper_loss=0.09958, over 19567.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01094, ecapa_loss=0.0001861, whisper_loss=0.0919, over 3444863.40 frames. ], batch size: 75, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:07:47,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1309020.0, ans=0.1 2024-08-11 22:07:55,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1309020.0, ans=0.125 2024-08-11 22:07:57,643 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 22:07:58,799 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.488e+01 3.017e+01 3.515e+01 8.522e+01, threshold=6.035e+01, percent-clipped=1.0 2024-08-11 22:08:24,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1309220.0, ans=0.1 2024-08-11 22:08:33,624 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 500, loss[loss=0.1114, beats_loss=0.007558, ecapa_loss=0.0002057, whisper_loss=0.1018, over 14543.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01101, ecapa_loss=0.0001837, whisper_loss=0.09134, over 3520472.58 frames. ], batch size: 55, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:08:35,358 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-11 22:08:48,453 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 22:08:53,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1309420.0, ans=0.125 2024-08-11 22:09:07,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1309520.0, ans=0.125 2024-08-11 22:09:11,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1309520.0, ans=0.1 2024-08-11 22:09:12,779 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2024-08-11 22:09:17,108 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.389e-01 2024-08-11 22:09:18,477 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.82 vs. limit=15.0 2024-08-11 22:09:19,321 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 22:09:50,825 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 550, loss[loss=0.09629, beats_loss=0.009881, ecapa_loss=0.0002311, whisper_loss=0.08409, over 19511.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01098, ecapa_loss=0.0001827, whisper_loss=0.09186, over 3589236.17 frames. ], batch size: 80, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:09:57,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1309820.0, ans=0.125 2024-08-11 22:09:59,711 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.67 vs. limit=10.0 2024-08-11 22:10:24,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1310020.0, ans=0.2 2024-08-11 22:10:28,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1310020.0, ans=0.1 2024-08-11 22:10:32,006 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.567e+01 3.001e+01 3.540e+01 6.068e+01, threshold=6.003e+01, percent-clipped=1.0 2024-08-11 22:10:37,149 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 22:10:50,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1310120.0, ans=0.125 2024-08-11 22:11:00,520 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 22:11:07,700 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 600, loss[loss=0.1049, beats_loss=0.01069, ecapa_loss=0.0002455, whisper_loss=0.09177, over 17001.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01101, ecapa_loss=0.0001825, whisper_loss=0.0918, over 3651498.42 frames. ], batch size: 75, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:11:15,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1310320.0, ans=0.125 2024-08-11 22:11:33,977 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-08-11 22:11:40,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1310520.0, ans=0.1 2024-08-11 22:11:48,182 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 22:11:48,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1310520.0, ans=0.2 2024-08-11 22:11:57,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-11 22:12:09,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1310720.0, ans=0.125 2024-08-11 22:12:15,877 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=15.0 2024-08-11 22:12:24,000 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 650, loss[loss=0.1112, beats_loss=0.01086, ecapa_loss=0.0001916, whisper_loss=0.09843, over 21859.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01112, ecapa_loss=0.0001819, whisper_loss=0.09145, over 3691029.28 frames. ], batch size: 82, lr: 6.47e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:12:36,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1310820.0, ans=0.0 2024-08-11 22:12:53,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1311020.0, ans=0.125 2024-08-11 22:12:58,892 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 22:13:01,537 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 22:13:03,428 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 22:13:04,483 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.576e+01 2.785e+01 3.016e+01 3.995e+01, threshold=5.570e+01, percent-clipped=0.0 2024-08-11 22:13:29,403 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 22:13:32,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1311220.0, ans=0.0 2024-08-11 22:13:40,375 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 700, loss[loss=0.1143, beats_loss=0.01005, ecapa_loss=0.0002123, whisper_loss=0.1022, over 18955.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01114, ecapa_loss=0.0001809, whisper_loss=0.09124, over 3697578.06 frames. ], batch size: 72, lr: 6.47e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:13:48,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1311320.0, ans=0.125 2024-08-11 22:13:50,381 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-08-11 22:13:57,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2024-08-11 22:13:57,776 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 22:14:00,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1311420.0, ans=0.125 2024-08-11 22:14:22,390 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 22:14:39,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1311620.0, ans=0.125 2024-08-11 22:14:49,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1311720.0, ans=0.0 2024-08-11 22:14:49,476 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.10 vs. limit=22.5 2024-08-11 22:14:59,171 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-11 22:15:00,350 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 750, loss[loss=0.0988, beats_loss=0.008055, ecapa_loss=0.0002174, whisper_loss=0.08857, over 15709.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01105, ecapa_loss=0.0001817, whisper_loss=0.09151, over 3720718.17 frames. ], batch size: 61, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:15:41,508 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.583e+01 2.835e+01 3.303e+01 6.155e+01, threshold=5.670e+01, percent-clipped=2.0 2024-08-11 22:15:49,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1312120.0, ans=0.2 2024-08-11 22:16:02,477 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-11 22:16:12,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1312220.0, ans=0.1 2024-08-11 22:16:16,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1312320.0, ans=0.2 2024-08-11 22:16:17,535 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 800, loss[loss=0.1116, beats_loss=0.009934, ecapa_loss=0.000169, whisper_loss=0.09999, over 21507.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01117, ecapa_loss=0.0001803, whisper_loss=0.0911, over 3777028.01 frames. ], batch size: 83, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:16:43,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1312420.0, ans=0.1 2024-08-11 22:17:06,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1312520.0, ans=0.0 2024-08-11 22:17:14,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2024-08-11 22:17:24,013 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 22:17:52,222 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 850, loss[loss=0.124, beats_loss=0.00911, ecapa_loss=0.0002054, whisper_loss=0.1128, over 14920.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01117, ecapa_loss=0.00018, whisper_loss=0.09033, over 3780119.28 frames. ], batch size: 58, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:17:53,466 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2024-08-11 22:17:56,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2024-08-11 22:18:03,992 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 22:18:36,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1313020.0, ans=0.2 2024-08-11 22:18:41,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.539e+01 2.872e+01 3.296e+01 5.215e+01, threshold=5.743e+01, percent-clipped=0.0 2024-08-11 22:18:55,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1313120.0, ans=0.125 2024-08-11 22:19:09,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1313220.0, ans=0.0 2024-08-11 22:19:19,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1313220.0, ans=0.125 2024-08-11 22:19:25,984 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 900, loss[loss=0.1105, beats_loss=0.01197, ecapa_loss=0.0001744, whisper_loss=0.09679, over 20349.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01105, ecapa_loss=0.0001806, whisper_loss=0.09117, over 3776015.57 frames. ], batch size: 80, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:19:29,885 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 22:19:39,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1313320.0, ans=0.2 2024-08-11 22:20:01,494 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-11 22:20:13,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1313520.0, ans=0.09899494936611666 2024-08-11 22:20:17,552 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 22:20:28,859 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 22:20:34,296 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-08-11 22:20:36,152 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:21:02,289 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 950, loss[loss=0.09254, beats_loss=0.01089, ecapa_loss=0.000199, whisper_loss=0.07966, over 14913.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01112, ecapa_loss=0.0001793, whisper_loss=0.09078, over 3781790.05 frames. ], batch size: 59, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:21:02,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1313820.0, ans=0.125 2024-08-11 22:21:09,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1313820.0, ans=0.0 2024-08-11 22:21:26,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1313920.0, ans=0.07 2024-08-11 22:21:46,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1314020.0, ans=0.05 2024-08-11 22:21:51,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.622e+01 2.859e+01 3.329e+01 4.580e+01, threshold=5.718e+01, percent-clipped=0.0 2024-08-11 22:22:06,287 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 22:22:34,447 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1000, loss[loss=0.1168, beats_loss=0.01023, ecapa_loss=0.0001751, whisper_loss=0.1048, over 22825.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01104, ecapa_loss=0.0001804, whisper_loss=0.09112, over 3785361.46 frames. ], batch size: 91, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:22:43,700 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-11 22:22:49,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1314320.0, ans=15.0 2024-08-11 22:22:50,369 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 22:23:18,024 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 22:23:23,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1314520.0, ans=0.125 2024-08-11 22:23:27,458 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 22:23:31,819 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.81 vs. limit=15.0 2024-08-11 22:23:33,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1314620.0, ans=0.0 2024-08-11 22:23:43,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1314720.0, ans=0.125 2024-08-11 22:24:01,296 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1050, loss[loss=0.07447, beats_loss=0.01497, ecapa_loss=0.0001564, whisper_loss=0.05794, over 18004.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01112, ecapa_loss=0.0001795, whisper_loss=0.08985, over 3773956.81 frames. ], batch size: 74, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:24:11,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1314820.0, ans=0.2 2024-08-11 22:24:16,648 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 22:24:18,048 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 22:24:19,485 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 10 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 22:24:23,797 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 22:24:38,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 2.421e+01 2.684e+01 3.099e+01 9.894e+01, threshold=5.368e+01, percent-clipped=2.0 2024-08-11 22:24:38,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1315020.0, ans=0.2 2024-08-11 22:24:53,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1315120.0, ans=0.0 2024-08-11 22:25:02,957 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 22:25:09,444 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1100, loss[loss=0.1033, beats_loss=0.01105, ecapa_loss=0.0002129, whisper_loss=0.09009, over 22175.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01111, ecapa_loss=0.0001799, whisper_loss=0.09017, over 3772251.46 frames. ], batch size: 91, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:25:15,074 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 22:25:24,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1315420.0, ans=0.2 2024-08-11 22:25:29,153 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=22.5 2024-08-11 22:25:35,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1315520.0, ans=0.0 2024-08-11 22:25:35,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1315520.0, ans=0.125 2024-08-11 22:25:39,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1315520.0, ans=0.1 2024-08-11 22:25:40,762 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 26 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-11 22:25:41,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1315520.0, ans=0.125 2024-08-11 22:25:50,240 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 22:25:50,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1315620.0, ans=0.2 2024-08-11 22:25:59,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1315620.0, ans=0.0 2024-08-11 22:26:02,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1315720.0, ans=0.125 2024-08-11 22:26:17,878 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1150, loss[loss=0.08049, beats_loss=0.01183, ecapa_loss=0.0001676, whisper_loss=0.06699, over 13006.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0111, ecapa_loss=0.0001805, whisper_loss=0.09068, over 3787723.23 frames. ], batch size: 54, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:26:18,139 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 22:26:33,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1315920.0, ans=0.125 2024-08-11 22:26:40,127 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.216e+00 2024-08-11 22:26:41,600 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 22:26:47,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1316020.0, ans=0.125 2024-08-11 22:26:54,937 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.627e+01 2.933e+01 3.282e+01 4.582e+01, threshold=5.866e+01, percent-clipped=0.0 2024-08-11 22:26:56,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1316020.0, ans=0.2 2024-08-11 22:27:15,900 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 22:27:24,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1316220.0, ans=0.0 2024-08-11 22:27:26,994 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1200, loss[loss=0.1025, beats_loss=0.01212, ecapa_loss=0.0001588, whisper_loss=0.08877, over 20802.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01109, ecapa_loss=0.0001807, whisper_loss=0.0909, over 3814626.32 frames. ], batch size: 79, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:27:50,916 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 22:27:53,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1316520.0, ans=0.125 2024-08-11 22:28:18,315 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.076e+00 2024-08-11 22:28:35,815 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1250, loss[loss=0.08905, beats_loss=0.01284, ecapa_loss=0.0001869, whisper_loss=0.07434, over 20867.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01119, ecapa_loss=0.0001795, whisper_loss=0.09074, over 3826921.87 frames. ], batch size: 89, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:28:42,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1316820.0, ans=0.0 2024-08-11 22:28:48,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1316920.0, ans=0.0 2024-08-11 22:29:05,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1317020.0, ans=0.0 2024-08-11 22:29:05,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1317020.0, ans=0.1 2024-08-11 22:29:11,878 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.416e+01 2.652e+01 2.971e+01 4.212e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-11 22:29:15,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1317120.0, ans=0.125 2024-08-11 22:29:30,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2024-08-11 22:29:33,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1317220.0, ans=0.0 2024-08-11 22:29:39,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1317220.0, ans=0.07 2024-08-11 22:29:42,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1317320.0, ans=0.125 2024-08-11 22:29:43,157 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1300, loss[loss=0.1354, beats_loss=0.01018, ecapa_loss=0.0001678, whisper_loss=0.1235, over 19175.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0111, ecapa_loss=0.0001796, whisper_loss=0.09135, over 3838846.33 frames. ], batch size: 73, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:29:43,386 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-11 22:30:04,933 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2024-08-11 22:30:42,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1317720.0, ans=0.0 2024-08-11 22:30:46,309 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-11 22:30:48,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1317720.0, ans=0.1 2024-08-11 22:30:48,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1317720.0, ans=15.0 2024-08-11 22:30:51,450 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1350, loss[loss=0.08942, beats_loss=0.01223, ecapa_loss=0.0001427, whisper_loss=0.07576, over 20501.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01108, ecapa_loss=0.0001782, whisper_loss=0.09149, over 3823727.47 frames. ], batch size: 79, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:31:07,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1317920.0, ans=0.0 2024-08-11 22:31:28,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.605e+01 2.891e+01 3.294e+01 5.251e+01, threshold=5.782e+01, percent-clipped=0.0 2024-08-11 22:32:00,639 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1400, loss[loss=0.07597, beats_loss=0.01079, ecapa_loss=0.0002069, whisper_loss=0.06312, over 19369.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01113, ecapa_loss=0.0001769, whisper_loss=0.09124, over 3845656.66 frames. ], batch size: 76, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:32:07,242 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.51 vs. limit=15.0 2024-08-11 22:32:13,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1318420.0, ans=0.07 2024-08-11 22:32:33,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1318520.0, ans=0.1 2024-08-11 22:32:40,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1318520.0, ans=0.125 2024-08-11 22:32:41,560 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 22:32:50,987 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.21 vs. limit=22.5 2024-08-11 22:33:11,106 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1450, loss[loss=0.1118, beats_loss=0.009141, ecapa_loss=0.0001274, whisper_loss=0.1013, over 15148.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01112, ecapa_loss=0.0001759, whisper_loss=0.09155, over 3832095.72 frames. ], batch size: 53, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:33:11,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1318820.0, ans=10.0 2024-08-11 22:33:39,049 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 22:33:40,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1318820.0, ans=0.125 2024-08-11 22:33:54,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1318920.0, ans=0.125 2024-08-11 22:34:01,357 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 22:34:05,007 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 31 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-11 22:34:06,273 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 22:34:12,981 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.426e+01 2.679e+01 3.124e+01 8.618e+01, threshold=5.357e+01, percent-clipped=1.0 2024-08-11 22:34:45,955 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1500, loss[loss=0.09162, beats_loss=0.01414, ecapa_loss=0.0001688, whisper_loss=0.07579, over 17303.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01115, ecapa_loss=0.0001762, whisper_loss=0.09114, over 3834649.82 frames. ], batch size: 72, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:34:52,509 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 22:35:00,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1319420.0, ans=0.125 2024-08-11 22:35:27,345 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 16 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 22:35:35,158 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.71 vs. limit=22.5 2024-08-11 22:35:41,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1319620.0, ans=0.125 2024-08-11 22:35:43,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1319720.0, ans=0.125 2024-08-11 22:35:45,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1319720.0, ans=0.125 2024-08-11 22:35:52,740 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 22:35:57,882 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1550, loss[loss=0.1056, beats_loss=0.01192, ecapa_loss=0.0001601, whisper_loss=0.0921, over 18103.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01112, ecapa_loss=0.0001762, whisper_loss=0.09124, over 3845975.69 frames. ], batch size: 69, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:35:58,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1319820.0, ans=0.125 2024-08-11 22:35:58,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1319820.0, ans=0.1 2024-08-11 22:35:59,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1319820.0, ans=0.0 2024-08-11 22:36:03,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1319820.0, ans=0.025 2024-08-11 22:36:05,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1319820.0, ans=0.5 2024-08-11 22:36:16,332 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 18 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 22:36:24,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1319920.0, ans=0.07 2024-08-11 22:36:27,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1320020.0, ans=0.125 2024-08-11 22:36:30,399 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.32 vs. limit=15.0 2024-08-11 22:36:33,399 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.46 vs. limit=15.0 2024-08-11 22:36:36,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1320020.0, ans=0.0 2024-08-11 22:36:37,882 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.581e+01 2.864e+01 3.252e+01 1.978e+02, threshold=5.728e+01, percent-clipped=3.0 2024-08-11 22:36:39,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1320020.0, ans=0.1 2024-08-11 22:36:49,581 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=12.0 2024-08-11 22:37:02,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1320220.0, ans=0.125 2024-08-11 22:37:03,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-11 22:37:09,289 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1600, loss[loss=0.1033, beats_loss=0.01, ecapa_loss=0.0002097, whisper_loss=0.09119, over 19612.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01109, ecapa_loss=0.0001763, whisper_loss=0.09163, over 3863698.17 frames. ], batch size: 80, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:37:11,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1320320.0, ans=0.0 2024-08-11 22:37:14,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1320320.0, ans=0.04949747468305833 2024-08-11 22:37:23,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1320420.0, ans=0.125 2024-08-11 22:37:40,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1320520.0, ans=0.125 2024-08-11 22:38:01,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1320620.0, ans=0.1 2024-08-11 22:38:09,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1320720.0, ans=0.125 2024-08-11 22:38:14,960 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 25 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 22:38:17,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1650, loss[loss=0.07442, beats_loss=0.01159, ecapa_loss=0.0001724, whisper_loss=0.06111, over 14134.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01112, ecapa_loss=0.0001773, whisper_loss=0.09127, over 3856043.72 frames. ], batch size: 58, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:38:18,967 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 22:38:19,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1320820.0, ans=0.125 2024-08-11 22:38:31,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2024-08-11 22:38:34,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1320920.0, ans=0.95 2024-08-11 22:38:55,171 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.537e+01 2.816e+01 3.147e+01 5.584e+01, threshold=5.632e+01, percent-clipped=0.0 2024-08-11 22:38:59,331 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 22:39:15,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1321220.0, ans=0.125 2024-08-11 22:39:19,672 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=12.0 2024-08-11 22:39:27,079 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1700, loss[loss=0.1029, beats_loss=0.0127, ecapa_loss=0.0002129, whisper_loss=0.08812, over 21293.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01107, ecapa_loss=0.0001776, whisper_loss=0.09122, over 3842197.72 frames. ], batch size: 90, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:39:35,330 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 22:39:47,810 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-11 22:39:51,909 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 22:40:26,263 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 37 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 22:40:34,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1321820.0, ans=0.0 2024-08-11 22:40:35,619 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1750, loss[loss=0.08469, beats_loss=0.01379, ecapa_loss=0.0002045, whisper_loss=0.06886, over 22368.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01108, ecapa_loss=0.0001769, whisper_loss=0.09159, over 3865128.41 frames. ], batch size: 92, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:40:36,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1321820.0, ans=0.125 2024-08-11 22:40:37,082 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 22:41:08,908 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-11 22:41:13,000 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.558e+01 2.941e+01 3.375e+01 5.382e+01, threshold=5.883e+01, percent-clipped=0.0 2024-08-11 22:41:16,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1322120.0, ans=0.125 2024-08-11 22:41:25,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.61 vs. limit=22.5 2024-08-11 22:41:35,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1322220.0, ans=0.0 2024-08-11 22:41:45,279 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1800, loss[loss=0.106, beats_loss=0.01211, ecapa_loss=0.0001576, whisper_loss=0.09227, over 24104.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01106, ecapa_loss=0.0001771, whisper_loss=0.09137, over 3891832.13 frames. ], batch size: 92, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:41:47,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1322320.0, ans=0.125 2024-08-11 22:42:04,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1322420.0, ans=0.125 2024-08-11 22:42:09,631 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 22:42:11,257 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 22:42:54,371 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 22:42:55,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1322720.0, ans=0.125 2024-08-11 22:42:58,179 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1850, loss[loss=0.1128, beats_loss=0.01205, ecapa_loss=0.0001717, whisper_loss=0.09898, over 21920.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.011, ecapa_loss=0.000177, whisper_loss=0.09133, over 3874721.14 frames. ], batch size: 86, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:43:11,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1322820.0, ans=0.1 2024-08-11 22:43:23,998 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 22:43:29,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1323020.0, ans=0.1 2024-08-11 22:43:32,572 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2024-08-11 22:43:39,846 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.593e+01 2.917e+01 3.347e+01 7.328e+01, threshold=5.834e+01, percent-clipped=1.0 2024-08-11 22:43:47,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1323120.0, ans=0.0 2024-08-11 22:44:07,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1323220.0, ans=0.125 2024-08-11 22:44:10,124 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=15.0 2024-08-11 22:44:12,211 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1900, loss[loss=0.09205, beats_loss=0.01423, ecapa_loss=0.0001782, whisper_loss=0.07603, over 20800.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01099, ecapa_loss=0.0001805, whisper_loss=0.09137, over 3840860.79 frames. ], batch size: 86, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:44:22,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1323320.0, ans=0.125 2024-08-11 22:44:22,598 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=12.0 2024-08-11 22:44:43,022 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 22:45:12,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1323720.0, ans=0.2 2024-08-11 22:45:20,130 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 14 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 22:45:24,454 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 1950, loss[loss=0.0948, beats_loss=0.01082, ecapa_loss=0.000156, whisper_loss=0.08242, over 14778.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01101, ecapa_loss=0.0001816, whisper_loss=0.09145, over 3820226.28 frames. ], batch size: 55, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:45:24,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1323820.0, ans=0.125 2024-08-11 22:45:30,828 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=12.0 2024-08-11 22:45:33,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1323820.0, ans=0.0 2024-08-11 22:45:35,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1323820.0, ans=0.1 2024-08-11 22:45:39,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1323920.0, ans=0.0 2024-08-11 22:45:41,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1323920.0, ans=0.125 2024-08-11 22:45:55,747 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-11 22:46:01,050 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 22:46:02,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.530e+01 2.923e+01 3.581e+01 1.963e+02, threshold=5.846e+01, percent-clipped=3.0 2024-08-11 22:46:17,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1324120.0, ans=0.1 2024-08-11 22:46:18,757 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 22:46:20,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1324220.0, ans=0.07 2024-08-11 22:46:22,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1324220.0, ans=0.2 2024-08-11 22:46:29,427 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.45 vs. limit=10.0 2024-08-11 22:46:30,670 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 28 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 22:46:36,553 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2000, loss[loss=0.09407, beats_loss=0.01177, ecapa_loss=0.0001478, whisper_loss=0.08083, over 16554.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01099, ecapa_loss=0.0001828, whisper_loss=0.09204, over 3843928.73 frames. ], batch size: 62, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:46:53,748 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 22:46:57,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1324420.0, ans=0.125 2024-08-11 22:47:04,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1324420.0, ans=0.1 2024-08-11 22:47:13,827 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-11 22:47:41,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1324720.0, ans=0.0 2024-08-11 22:47:51,921 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2050, loss[loss=0.08983, beats_loss=0.01012, ecapa_loss=0.0001775, whisper_loss=0.07794, over 19331.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01103, ecapa_loss=0.0001841, whisper_loss=0.09159, over 3825325.96 frames. ], batch size: 78, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:48:01,159 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-11 22:48:30,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.631e+01 3.014e+01 3.370e+01 4.766e+01, threshold=6.027e+01, percent-clipped=0.0 2024-08-11 22:48:43,058 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2024-08-11 22:48:58,949 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-11 22:49:03,225 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2100, loss[loss=0.08163, beats_loss=0.012, ecapa_loss=0.0002039, whisper_loss=0.06759, over 19906.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01111, ecapa_loss=0.0001831, whisper_loss=0.091, over 3804601.72 frames. ], batch size: 83, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:49:15,374 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.32 vs. limit=10.0 2024-08-11 22:49:31,456 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-11 22:49:47,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1325620.0, ans=15.0 2024-08-11 22:50:02,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1325720.0, ans=0.125 2024-08-11 22:50:11,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1325720.0, ans=0.125 2024-08-11 22:50:18,667 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2150, loss[loss=0.08757, beats_loss=0.01132, ecapa_loss=0.0001822, whisper_loss=0.07443, over 21610.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01108, ecapa_loss=0.0001836, whisper_loss=0.09157, over 3836320.49 frames. ], batch size: 89, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:50:25,261 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.07 vs. limit=10.0 2024-08-11 22:50:56,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1326020.0, ans=0.0 2024-08-11 22:50:57,007 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.560e+01 2.828e+01 3.267e+01 5.795e+01, threshold=5.656e+01, percent-clipped=0.0 2024-08-11 22:50:59,067 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-11 22:51:08,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1326120.0, ans=0.1 2024-08-11 22:51:31,050 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2200, loss[loss=0.1211, beats_loss=0.009319, ecapa_loss=0.0002136, whisper_loss=0.1097, over 20672.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01112, ecapa_loss=0.000183, whisper_loss=0.09205, over 3838654.74 frames. ], batch size: 83, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:51:32,646 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 29 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-11 22:51:35,876 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-11 22:51:36,530 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2024-08-11 22:51:46,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1326420.0, ans=0.125 2024-08-11 22:51:48,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1326420.0, ans=0.125 2024-08-11 22:52:11,246 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 22:52:11,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1326520.0, ans=0.05 2024-08-11 22:52:21,714 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 22:52:24,777 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-11 22:52:30,709 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:52:33,511 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 22:52:43,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1326820.0, ans=0.0 2024-08-11 22:52:44,237 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2250, loss[loss=0.11, beats_loss=0.01272, ecapa_loss=0.0001843, whisper_loss=0.09541, over 21872.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01109, ecapa_loss=0.0001841, whisper_loss=0.09273, over 3834283.60 frames. ], batch size: 89, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:52:46,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1326820.0, ans=0.125 2024-08-11 22:52:54,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1326820.0, ans=0.125 2024-08-11 22:53:07,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1326920.0, ans=0.125 2024-08-11 22:53:09,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1326920.0, ans=0.07 2024-08-11 22:53:09,998 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2024-08-11 22:53:16,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1327020.0, ans=0.125 2024-08-11 22:53:17,962 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 22:53:24,643 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.654e+01 2.933e+01 3.292e+01 6.746e+01, threshold=5.867e+01, percent-clipped=1.0 2024-08-11 22:53:32,440 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 40 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 22:53:46,143 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.524e+00 2024-08-11 22:53:55,091 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 22:53:58,118 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2300, loss[loss=0.1204, beats_loss=0.01104, ecapa_loss=0.0001888, whisper_loss=0.1075, over 21093.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01105, ecapa_loss=0.0001842, whisper_loss=0.09393, over 3867752.42 frames. ], batch size: 85, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:54:01,374 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.737e-01 2024-08-11 22:54:09,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1327320.0, ans=0.125 2024-08-11 22:54:09,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1327320.0, ans=0.0 2024-08-11 22:54:18,244 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 22:54:28,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1327520.0, ans=0.1 2024-08-11 22:54:46,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1327620.0, ans=0.0 2024-08-11 22:54:48,860 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-11 22:55:06,322 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 34 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-11 22:55:12,261 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2350, loss[loss=0.09429, beats_loss=0.01387, ecapa_loss=0.0001784, whisper_loss=0.07864, over 21247.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01107, ecapa_loss=0.0001849, whisper_loss=0.09358, over 3875471.56 frames. ], batch size: 85, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:55:12,689 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 19 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-11 22:55:15,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.71 vs. limit=6.0 2024-08-11 22:55:32,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1327920.0, ans=0.05 2024-08-11 22:55:37,960 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 40 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 22:55:39,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.88 vs. limit=15.0 2024-08-11 22:55:49,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1328020.0, ans=0.2 2024-08-11 22:55:52,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.549e+01 2.872e+01 3.307e+01 6.850e+01, threshold=5.744e+01, percent-clipped=1.0 2024-08-11 22:56:00,019 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 22:56:25,749 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2400, loss[loss=0.1299, beats_loss=0.008225, ecapa_loss=0.0002265, whisper_loss=0.1194, over 23607.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01107, ecapa_loss=0.0001854, whisper_loss=0.09363, over 3885389.77 frames. ], batch size: 93, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:56:25,877 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 22:56:27,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1328320.0, ans=0.0 2024-08-11 22:56:31,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1328320.0, ans=0.0 2024-08-11 22:56:38,566 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 22:56:38,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1328420.0, ans=0.125 2024-08-11 22:56:53,617 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 22:56:53,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1328520.0, ans=0.125 2024-08-11 22:56:53,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1328520.0, ans=0.04949747468305833 2024-08-11 22:56:56,717 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 22:56:59,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1328520.0, ans=0.125 2024-08-11 22:57:00,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1328520.0, ans=0.1 2024-08-11 22:57:06,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1328520.0, ans=0.125 2024-08-11 22:57:21,986 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 22:57:27,059 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 22:57:31,498 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 22:57:39,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1328720.0, ans=0.125 2024-08-11 22:57:41,972 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2450, loss[loss=0.108, beats_loss=0.01234, ecapa_loss=0.0001483, whisper_loss=0.09414, over 19178.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01107, ecapa_loss=0.0001855, whisper_loss=0.09349, over 3865124.33 frames. ], batch size: 75, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:57:54,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1328820.0, ans=0.05 2024-08-11 22:58:21,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.487e+01 2.806e+01 3.226e+01 5.199e+01, threshold=5.611e+01, percent-clipped=0.0 2024-08-11 22:58:21,679 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-11 22:58:56,091 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2500, loss[loss=0.09492, beats_loss=0.01134, ecapa_loss=0.0001702, whisper_loss=0.08188, over 21437.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01108, ecapa_loss=0.0001856, whisper_loss=0.09345, over 3894412.36 frames. ], batch size: 88, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:59:03,613 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.881e-01 2024-08-11 22:59:04,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1329320.0, ans=0.025 2024-08-11 22:59:39,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1329520.0, ans=0.125 2024-08-11 23:00:09,931 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 23:00:14,383 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2550, loss[loss=0.1075, beats_loss=0.01193, ecapa_loss=0.000194, whisper_loss=0.09365, over 20698.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01105, ecapa_loss=0.0001845, whisper_loss=0.09388, over 3889788.13 frames. ], batch size: 86, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:00:49,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1330020.0, ans=0.125 2024-08-11 23:00:51,804 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 23:00:57,045 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.600e+01 2.873e+01 3.308e+01 4.841e+01, threshold=5.745e+01, percent-clipped=0.0 2024-08-11 23:00:58,357 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.06 vs. limit=22.5 2024-08-11 23:01:07,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1330120.0, ans=0.125 2024-08-11 23:01:18,516 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 23:01:27,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1330220.0, ans=0.125 2024-08-11 23:01:34,674 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2600, loss[loss=0.1059, beats_loss=0.01007, ecapa_loss=0.000162, whisper_loss=0.09421, over 17558.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01108, ecapa_loss=0.0001841, whisper_loss=0.09306, over 3863901.12 frames. ], batch size: 68, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:01:37,737 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 23:01:57,964 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 33 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 23:02:02,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1330420.0, ans=0.125 2024-08-11 23:02:08,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1330520.0, ans=0.125 2024-08-11 23:02:30,678 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:02:31,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2024-08-11 23:02:46,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1330720.0, ans=0.125 2024-08-11 23:02:51,795 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2650, loss[loss=0.1189, beats_loss=0.009595, ecapa_loss=0.0001914, whisper_loss=0.1074, over 23324.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01113, ecapa_loss=0.0001843, whisper_loss=0.0928, over 3867551.21 frames. ], batch size: 93, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:03:19,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1330920.0, ans=0.125 2024-08-11 23:03:31,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1331020.0, ans=10.0 2024-08-11 23:03:35,217 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.699e+01 2.993e+01 3.555e+01 9.155e+01, threshold=5.987e+01, percent-clipped=1.0 2024-08-11 23:03:49,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1331120.0, ans=0.0 2024-08-11 23:03:54,086 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 23:03:59,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1331220.0, ans=0.125 2024-08-11 23:04:06,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1331220.0, ans=0.125 2024-08-11 23:04:06,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1331220.0, ans=0.125 2024-08-11 23:04:13,111 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2700, loss[loss=0.1047, beats_loss=0.009411, ecapa_loss=0.0001774, whisper_loss=0.09347, over 14735.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0111, ecapa_loss=0.0001848, whisper_loss=0.09231, over 3890036.86 frames. ], batch size: 56, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:04:13,285 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 23:04:21,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1331320.0, ans=0.125 2024-08-11 23:04:22,984 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 23:04:31,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1331420.0, ans=0.2 2024-08-11 23:04:32,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1331420.0, ans=0.125 2024-08-11 23:04:40,378 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 19 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-11 23:04:53,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1331520.0, ans=0.2 2024-08-11 23:04:55,914 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 23:04:56,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1331520.0, ans=0.0 2024-08-11 23:04:57,732 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 23:05:01,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1331620.0, ans=0.1 2024-08-11 23:05:21,747 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 30 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 23:05:26,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1331720.0, ans=0.125 2024-08-11 23:05:32,602 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2750, loss[loss=0.08947, beats_loss=0.01055, ecapa_loss=0.0001902, whisper_loss=0.07702, over 18835.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01107, ecapa_loss=0.0001855, whisper_loss=0.09225, over 3873996.25 frames. ], batch size: 78, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:05:50,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1331920.0, ans=0.125 2024-08-11 23:06:18,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.646e+01 2.996e+01 3.308e+01 5.705e+01, threshold=5.992e+01, percent-clipped=0.0 2024-08-11 23:06:23,206 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 23:06:25,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1332120.0, ans=0.035 2024-08-11 23:06:41,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1332220.0, ans=0.125 2024-08-11 23:06:44,799 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 7 from Vox, 40 fro AS 2024-08-11 23:06:54,832 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2800, loss[loss=0.1247, beats_loss=0.009679, ecapa_loss=0.0001872, whisper_loss=0.1131, over 23184.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01113, ecapa_loss=0.0001844, whisper_loss=0.0923, over 3888056.40 frames. ], batch size: 91, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:07:05,644 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 23:07:22,100 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2024-08-11 23:07:32,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1332520.0, ans=0.125 2024-08-11 23:07:40,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1332620.0, ans=0.1 2024-08-11 23:07:47,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1332620.0, ans=0.2 2024-08-11 23:08:04,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1332720.0, ans=0.125 2024-08-11 23:08:05,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1332720.0, ans=0.125 2024-08-11 23:08:07,313 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-11 23:08:12,987 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2850, loss[loss=0.09542, beats_loss=0.01029, ecapa_loss=0.0001831, whisper_loss=0.0833, over 16680.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01117, ecapa_loss=0.0001849, whisper_loss=0.09187, over 3879879.02 frames. ], batch size: 65, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:08:48,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1333020.0, ans=0.125 2024-08-11 23:08:57,892 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.704e+01 2.991e+01 3.400e+01 6.217e+01, threshold=5.982e+01, percent-clipped=1.0 2024-08-11 23:09:21,624 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-11 23:09:33,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1333320.0, ans=0.1 2024-08-11 23:09:33,842 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2900, loss[loss=0.1029, beats_loss=0.01063, ecapa_loss=0.0001794, whisper_loss=0.09052, over 15176.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01116, ecapa_loss=0.0001858, whisper_loss=0.09262, over 3897068.35 frames. ], batch size: 60, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:09:35,536 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-11 23:09:55,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1333420.0, ans=0.0 2024-08-11 23:09:57,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1333420.0, ans=0.125 2024-08-11 23:10:10,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1333520.0, ans=0.1 2024-08-11 23:10:10,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1333520.0, ans=0.1 2024-08-11 23:10:18,705 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 23:10:34,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1333620.0, ans=0.1 2024-08-11 23:10:34,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1333620.0, ans=0.125 2024-08-11 23:10:34,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1333620.0, ans=0.125 2024-08-11 23:10:44,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1333720.0, ans=0.125 2024-08-11 23:10:54,007 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 2950, loss[loss=0.1011, beats_loss=0.01213, ecapa_loss=0.0002086, whisper_loss=0.0869, over 14876.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01118, ecapa_loss=0.0001866, whisper_loss=0.0931, over 3909191.93 frames. ], batch size: 62, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:10:57,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1333820.0, ans=0.0 2024-08-11 23:11:00,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1333820.0, ans=0.0 2024-08-11 23:11:01,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1333820.0, ans=0.0 2024-08-11 23:11:06,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1333820.0, ans=0.0 2024-08-11 23:11:32,697 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 23:11:40,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.642e+01 2.977e+01 3.342e+01 4.548e+01, threshold=5.953e+01, percent-clipped=0.0 2024-08-11 23:11:52,529 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 20 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-11 23:11:53,799 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 23:11:58,460 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 18 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 23:12:07,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1334220.0, ans=0.1 2024-08-11 23:12:15,912 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3000, loss[loss=0.1269, beats_loss=0.0086, ecapa_loss=0.0001921, whisper_loss=0.1164, over 22267.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01118, ecapa_loss=0.0001871, whisper_loss=0.0927, over 3895866.51 frames. ], batch size: 88, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:12:15,913 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-11 23:12:58,482 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on ASR_libri: loss=0.2567, beats_loss=0, ecapa_loss=0.0006225, whisper_loss=0.2505, over 922467.00 frames. 2024-08-11 23:13:14,893 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on SV_voxceleb1: loss=0.004936, beats_loss=0, ecapa_loss=0.0004936, whisper_loss=0, over 939242.00 frames. 2024-08-11 23:15:19,804 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on AT_audioset: loss=0.02462, beats_loss=0.02462, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 23:15:19,807 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-11 23:15:22,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1334320.0, ans=0.0 2024-08-11 23:16:21,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1334620.0, ans=0.125 2024-08-11 23:16:39,423 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3050, loss[loss=0.1041, beats_loss=0.01067, ecapa_loss=0.0001727, whisper_loss=0.0917, over 22490.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0112, ecapa_loss=0.0001878, whisper_loss=0.0933, over 3951718.50 frames. ], batch size: 90, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:17:21,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.698e+01 2.929e+01 3.403e+01 4.861e+01, threshold=5.858e+01, percent-clipped=0.0 2024-08-11 23:17:32,173 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 23:17:32,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1335120.0, ans=0.0 2024-08-11 23:17:35,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1335120.0, ans=0.125 2024-08-11 23:17:51,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1335220.0, ans=0.0 2024-08-11 23:17:53,713 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3100, loss[loss=0.1253, beats_loss=0.008978, ecapa_loss=0.0001897, whisper_loss=0.1144, over 18985.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01117, ecapa_loss=0.0001876, whisper_loss=0.09373, over 3956650.87 frames. ], batch size: 73, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:18:25,314 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-11 23:18:40,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1335620.0, ans=0.2 2024-08-11 23:18:44,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1335620.0, ans=0.125 2024-08-11 23:18:48,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1335620.0, ans=0.0 2024-08-11 23:18:52,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1335620.0, ans=0.0 2024-08-11 23:18:54,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1335720.0, ans=0.125 2024-08-11 23:19:05,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1335720.0, ans=0.0 2024-08-11 23:19:10,242 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3150, loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001534, whisper_loss=0.09053, over 15017.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0111, ecapa_loss=0.000189, whisper_loss=0.09363, over 3940493.88 frames. ], batch size: 53, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:19:17,640 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-11 23:19:34,519 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 23:19:38,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1335920.0, ans=10.0 2024-08-11 23:19:52,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.477e+01 2.769e+01 3.279e+01 6.467e+01, threshold=5.538e+01, percent-clipped=1.0 2024-08-11 23:20:15,499 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2024-08-11 23:20:18,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1336220.0, ans=0.0 2024-08-11 23:20:19,168 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 23:20:24,562 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3200, loss[loss=0.1014, beats_loss=0.01176, ecapa_loss=0.0002118, whisper_loss=0.08756, over 16170.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01121, ecapa_loss=0.0001886, whisper_loss=0.09304, over 3914248.06 frames. ], batch size: 65, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:20:29,400 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 23:20:35,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1336320.0, ans=0.1 2024-08-11 23:20:53,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1336520.0, ans=0.0 2024-08-11 23:21:04,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1336520.0, ans=0.0 2024-08-11 23:21:18,494 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 23:21:20,311 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 12 from LS+wenet, 9 from Vox, 35 fro AS 2024-08-11 23:21:31,900 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2024-08-11 23:21:35,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1336820.0, ans=0.0 2024-08-11 23:21:37,065 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3250, loss[loss=0.1197, beats_loss=0.008535, ecapa_loss=0.0001662, whisper_loss=0.1095, over 15230.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01117, ecapa_loss=0.000189, whisper_loss=0.094, over 3938383.41 frames. ], batch size: 54, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:21:55,110 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 23:22:12,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1337020.0, ans=0.1 2024-08-11 23:22:18,943 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.451e+01 2.863e+01 3.216e+01 4.803e+01, threshold=5.726e+01, percent-clipped=0.0 2024-08-11 23:22:21,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=1337120.0, ans=0.02 2024-08-11 23:22:21,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1337120.0, ans=0.1 2024-08-11 23:22:46,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1337220.0, ans=10.0 2024-08-11 23:22:48,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1337220.0, ans=0.125 2024-08-11 23:22:52,294 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3300, loss[loss=0.1191, beats_loss=0.008483, ecapa_loss=0.0001851, whisper_loss=0.1088, over 23414.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01109, ecapa_loss=0.0001893, whisper_loss=0.09421, over 3928551.95 frames. ], batch size: 91, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:22:56,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1337320.0, ans=0.0 2024-08-11 23:23:06,450 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2024-08-11 23:23:40,752 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 23:23:43,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1337620.0, ans=0.0 2024-08-11 23:23:54,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1337720.0, ans=0.125 2024-08-11 23:24:06,630 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3350, loss[loss=0.1199, beats_loss=0.0093, ecapa_loss=0.0001629, whisper_loss=0.109, over 18687.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01101, ecapa_loss=0.0001867, whisper_loss=0.09474, over 3931413.79 frames. ], batch size: 71, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:24:11,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1337820.0, ans=0.125 2024-08-11 23:24:20,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1337920.0, ans=0.0 2024-08-11 23:24:29,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1337920.0, ans=0.0 2024-08-11 23:24:45,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1338020.0, ans=0.0 2024-08-11 23:24:45,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.630e+01 2.898e+01 3.415e+01 6.649e+01, threshold=5.796e+01, percent-clipped=1.0 2024-08-11 23:24:47,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1338120.0, ans=0.125 2024-08-11 23:24:54,788 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 23:25:17,465 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3400, loss[loss=0.1141, beats_loss=0.009158, ecapa_loss=0.0001857, whisper_loss=0.1031, over 24115.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0111, ecapa_loss=0.0001855, whisper_loss=0.09394, over 3913941.93 frames. ], batch size: 92, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:25:19,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1338320.0, ans=0.0 2024-08-11 23:25:31,065 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 15 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 23:25:33,589 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 23:25:34,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1338420.0, ans=0.125 2024-08-11 23:25:54,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1338520.0, ans=0.125 2024-08-11 23:26:12,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1338720.0, ans=0.0 2024-08-11 23:26:15,795 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:26:17,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1338720.0, ans=0.0 2024-08-11 23:26:21,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1338720.0, ans=0.125 2024-08-11 23:26:27,331 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3450, loss[loss=0.123, beats_loss=0.009812, ecapa_loss=0.0001685, whisper_loss=0.1115, over 24205.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01123, ecapa_loss=0.0001846, whisper_loss=0.09314, over 3924279.83 frames. ], batch size: 90, lr: 6.41e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:26:27,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1338820.0, ans=0.025 2024-08-11 23:26:31,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1338820.0, ans=0.2 2024-08-11 23:26:32,528 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-11 23:26:49,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1338920.0, ans=0.0 2024-08-11 23:26:59,381 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.62 vs. limit=10.0 2024-08-11 23:27:08,677 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.641e+01 2.848e+01 3.378e+01 1.355e+02, threshold=5.696e+01, percent-clipped=1.0 2024-08-11 23:27:19,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1339120.0, ans=0.0 2024-08-11 23:27:37,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1339320.0, ans=0.125 2024-08-11 23:27:38,902 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3500, loss[loss=0.09431, beats_loss=0.01367, ecapa_loss=0.000175, whisper_loss=0.07888, over 18093.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01125, ecapa_loss=0.0001847, whisper_loss=0.09239, over 3883820.59 frames. ], batch size: 75, lr: 6.41e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:27:46,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1339320.0, ans=0.1 2024-08-11 23:27:47,406 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=22.5 2024-08-11 23:27:48,567 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.03 vs. limit=22.5 2024-08-11 23:28:06,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1339520.0, ans=0.125 2024-08-11 23:28:22,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1339620.0, ans=0.0 2024-08-11 23:28:25,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1339620.0, ans=0.125 2024-08-11 23:28:35,310 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 23:28:36,022 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-11 23:28:36,578 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 23:28:43,740 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 23:28:50,366 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3550, loss[loss=0.1295, beats_loss=0.007222, ecapa_loss=0.0001942, whisper_loss=0.1203, over 14740.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01121, ecapa_loss=0.0001835, whisper_loss=0.09286, over 3876417.82 frames. ], batch size: 55, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:29:10,314 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 23:29:25,539 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2024-08-11 23:29:32,283 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.587e+01 2.900e+01 3.239e+01 4.496e+01, threshold=5.800e+01, percent-clipped=0.0 2024-08-11 23:29:43,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1340120.0, ans=0.125 2024-08-11 23:30:00,291 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 23:30:06,437 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3600, loss[loss=0.1067, beats_loss=0.01057, ecapa_loss=0.0001841, whisper_loss=0.09433, over 20890.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01113, ecapa_loss=0.0001841, whisper_loss=0.09305, over 3901662.08 frames. ], batch size: 86, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:30:06,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1340320.0, ans=0.125 2024-08-11 23:30:10,796 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2024-08-11 23:30:18,654 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 23 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-11 23:30:28,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1340420.0, ans=0.125 2024-08-11 23:30:35,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1340420.0, ans=0.1 2024-08-11 23:31:02,158 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 17 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-11 23:31:23,302 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3650, loss[loss=0.09381, beats_loss=0.009666, ecapa_loss=0.0001569, whisper_loss=0.08258, over 19752.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01118, ecapa_loss=0.0001849, whisper_loss=0.09292, over 3877466.83 frames. ], batch size: 72, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:31:27,023 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2024-08-11 23:31:27,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1340820.0, ans=0.2 2024-08-11 23:31:37,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1340920.0, ans=0.125 2024-08-11 23:31:46,763 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 23:32:01,664 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2024-08-11 23:32:08,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.620e+01 2.825e+01 3.170e+01 6.141e+01, threshold=5.649e+01, percent-clipped=1.0 2024-08-11 23:32:32,831 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 23:32:34,662 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 23:32:35,030 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.018e-01 2024-08-11 23:32:36,027 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 30 from Vox, 13 fro AS 2024-08-11 23:32:39,562 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2024-08-11 23:32:41,462 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3700, loss[loss=0.1081, beats_loss=0.009521, ecapa_loss=0.000201, whisper_loss=0.09654, over 20463.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01114, ecapa_loss=0.0001852, whisper_loss=0.09322, over 3911244.57 frames. ], batch size: 81, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:32:56,008 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 23:32:57,768 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 23:33:02,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1341420.0, ans=0.125 2024-08-11 23:33:05,006 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 23:33:09,281 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 23:33:13,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1341520.0, ans=0.09899494936611666 2024-08-11 23:33:22,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1341520.0, ans=0.2 2024-08-11 23:33:22,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1341520.0, ans=0.125 2024-08-11 23:33:28,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2024-08-11 23:33:33,569 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 8 from Vox, 33 fro AS 2024-08-11 23:33:57,659 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3750, loss[loss=0.08667, beats_loss=0.01137, ecapa_loss=0.0002085, whisper_loss=0.07321, over 20315.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01121, ecapa_loss=0.0001874, whisper_loss=0.09246, over 3888440.25 frames. ], batch size: 87, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:34:31,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1342020.0, ans=0.125 2024-08-11 23:34:35,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1342020.0, ans=0.2 2024-08-11 23:34:41,317 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.517e+01 2.756e+01 3.054e+01 4.813e+01, threshold=5.513e+01, percent-clipped=0.0 2024-08-11 23:34:47,891 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=22.5 2024-08-11 23:34:49,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1342120.0, ans=0.0 2024-08-11 23:35:01,854 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 23:35:11,609 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3800, loss[loss=0.08007, beats_loss=0.0116, ecapa_loss=0.0001757, whisper_loss=0.06672, over 17299.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01122, ecapa_loss=0.0001888, whisper_loss=0.0923, over 3903542.96 frames. ], batch size: 65, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:35:18,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1342320.0, ans=0.2 2024-08-11 23:35:28,733 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 23:35:42,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1342520.0, ans=0.0 2024-08-11 23:35:44,427 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.411e-02 2024-08-11 23:35:45,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1342520.0, ans=0.0 2024-08-11 23:36:01,612 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2024-08-11 23:36:07,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1342620.0, ans=0.1 2024-08-11 23:36:13,778 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 23:36:16,666 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 23:36:25,135 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3850, loss[loss=0.1176, beats_loss=0.009201, ecapa_loss=0.0002024, whisper_loss=0.1064, over 16856.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01118, ecapa_loss=0.0001885, whisper_loss=0.09275, over 3905232.14 frames. ], batch size: 66, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:36:45,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1342920.0, ans=0.125 2024-08-11 23:37:04,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.584e+01 2.893e+01 3.554e+01 5.203e+01, threshold=5.787e+01, percent-clipped=0.0 2024-08-11 23:37:10,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1343120.0, ans=0.1 2024-08-11 23:37:11,183 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.76 vs. limit=5.0 2024-08-11 23:37:20,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1343220.0, ans=0.1 2024-08-11 23:37:20,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1343220.0, ans=0.0 2024-08-11 23:37:33,267 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3900, loss[loss=0.1005, beats_loss=0.01163, ecapa_loss=0.0001601, whisper_loss=0.08732, over 15575.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0112, ecapa_loss=0.0001868, whisper_loss=0.0932, over 3926051.99 frames. ], batch size: 60, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:37:48,945 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 23:37:50,271 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 23:37:53,187 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 23:38:02,514 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 23:38:03,852 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 35 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 23:38:06,605 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 23:38:25,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1343620.0, ans=0.1 2024-08-11 23:38:41,150 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 23:38:42,241 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 3950, loss[loss=0.0921, beats_loss=0.01314, ecapa_loss=0.0001586, whisper_loss=0.07738, over 17968.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01113, ecapa_loss=0.0001884, whisper_loss=0.09367, over 3920811.05 frames. ], batch size: 70, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:39:05,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1343920.0, ans=0.2 2024-08-11 23:39:06,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=15.0 2024-08-11 23:39:14,521 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:39:17,402 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.40 vs. limit=22.5 2024-08-11 23:39:19,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1344020.0, ans=0.125 2024-08-11 23:39:22,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.690e+01 2.958e+01 3.481e+01 5.578e+01, threshold=5.915e+01, percent-clipped=0.0 2024-08-11 23:39:25,278 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 23:39:51,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4000, loss[loss=0.08412, beats_loss=0.0108, ecapa_loss=0.0001943, whisper_loss=0.07137, over 21760.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01103, ecapa_loss=0.0001883, whisper_loss=0.09468, over 3922998.70 frames. ], batch size: 87, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:39:58,453 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-11 23:39:58,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1344320.0, ans=0.125 2024-08-11 23:39:58,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1344320.0, ans=0.0 2024-08-11 23:40:02,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1344320.0, ans=0.0 2024-08-11 23:40:10,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1344420.0, ans=0.2 2024-08-11 23:40:10,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1344420.0, ans=0.0 2024-08-11 23:40:12,698 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 23:40:13,344 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.39 vs. limit=22.5 2024-08-11 23:40:19,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1344520.0, ans=0.1 2024-08-11 23:40:27,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1344520.0, ans=0.1 2024-08-11 23:40:29,025 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 23:40:30,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1344520.0, ans=0.125 2024-08-11 23:40:30,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1344520.0, ans=0.2 2024-08-11 23:40:40,238 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 23:40:51,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1344720.0, ans=0.0 2024-08-11 23:41:00,549 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4050, loss[loss=0.09685, beats_loss=0.01039, ecapa_loss=0.0002142, whisper_loss=0.08432, over 21866.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01098, ecapa_loss=0.0001892, whisper_loss=0.09476, over 3897103.32 frames. ], batch size: 90, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:41:00,741 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 23:41:06,100 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 23:41:12,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1344820.0, ans=0.1 2024-08-11 23:41:37,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1345020.0, ans=0.125 2024-08-11 23:41:39,896 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.615e+01 3.014e+01 3.367e+01 5.886e+01, threshold=6.027e+01, percent-clipped=0.0 2024-08-11 23:42:03,643 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 23:42:06,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1345220.0, ans=0.0 2024-08-11 23:42:08,754 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4100, loss[loss=0.0943, beats_loss=0.01118, ecapa_loss=0.0002116, whisper_loss=0.081, over 15196.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01099, ecapa_loss=0.0001908, whisper_loss=0.09412, over 3868118.90 frames. ], batch size: 63, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:42:23,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1345420.0, ans=0.125 2024-08-11 23:42:43,824 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 23:42:55,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1345620.0, ans=0.125 2024-08-11 23:42:57,946 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-11 23:42:58,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1345620.0, ans=0.035 2024-08-11 23:43:04,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1345720.0, ans=0.2 2024-08-11 23:43:18,112 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4150, loss[loss=0.1095, beats_loss=0.01034, ecapa_loss=0.0001722, whisper_loss=0.09746, over 21142.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01101, ecapa_loss=0.0001912, whisper_loss=0.09409, over 3876712.45 frames. ], batch size: 82, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:43:29,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1345820.0, ans=0.125 2024-08-11 23:43:35,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1345920.0, ans=0.125 2024-08-11 23:43:37,554 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-11 23:43:52,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1346020.0, ans=0.1 2024-08-11 23:43:58,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.669e+01 2.869e+01 3.344e+01 4.634e+01, threshold=5.739e+01, percent-clipped=0.0 2024-08-11 23:44:13,971 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 23:44:16,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1346220.0, ans=0.125 2024-08-11 23:44:20,761 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 23:44:22,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1346220.0, ans=0.125 2024-08-11 23:44:27,809 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4200, loss[loss=0.104, beats_loss=0.0138, ecapa_loss=0.0001503, whisper_loss=0.08874, over 23324.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01101, ecapa_loss=0.0001908, whisper_loss=0.09403, over 3870242.89 frames. ], batch size: 90, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:44:28,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1346320.0, ans=0.125 2024-08-11 23:44:35,257 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=15.0 2024-08-11 23:44:47,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1346420.0, ans=0.2 2024-08-11 23:45:07,589 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 23:45:07,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1346620.0, ans=0.125 2024-08-11 23:45:07,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1346620.0, ans=0.0 2024-08-11 23:45:11,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1346620.0, ans=0.125 2024-08-11 23:45:33,924 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 23:45:36,407 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4250, loss[loss=0.1265, beats_loss=0.01172, ecapa_loss=0.0001596, whisper_loss=0.1132, over 18826.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01099, ecapa_loss=0.0001893, whisper_loss=0.09423, over 3900311.31 frames. ], batch size: 71, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:45:39,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1346820.0, ans=0.125 2024-08-11 23:45:39,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1346820.0, ans=0.125 2024-08-11 23:45:40,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1346820.0, ans=0.0 2024-08-11 23:45:41,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1346820.0, ans=0.125 2024-08-11 23:45:50,668 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 23:46:12,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1347020.0, ans=0.0 2024-08-11 23:46:13,514 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 23:46:17,363 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.600e+01 2.838e+01 3.253e+01 4.399e+01, threshold=5.676e+01, percent-clipped=0.0 2024-08-11 23:46:20,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1347120.0, ans=0.125 2024-08-11 23:46:20,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1347120.0, ans=0.0 2024-08-11 23:46:24,305 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 23:46:31,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1347220.0, ans=0.125 2024-08-11 23:46:35,553 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 23:46:46,090 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4300, loss[loss=0.08463, beats_loss=0.01407, ecapa_loss=0.0001903, whisper_loss=0.06865, over 20021.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01105, ecapa_loss=0.000189, whisper_loss=0.09353, over 3856371.40 frames. ], batch size: 82, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:47:12,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1347420.0, ans=0.0 2024-08-11 23:47:13,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1347520.0, ans=0.05 2024-08-11 23:47:55,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1347820.0, ans=0.125 2024-08-11 23:47:56,003 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4350, loss[loss=0.0981, beats_loss=0.01108, ecapa_loss=0.0001705, whisper_loss=0.08531, over 17441.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.011, ecapa_loss=0.0001891, whisper_loss=0.09366, over 3844528.07 frames. ], batch size: 67, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:48:01,884 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 23:48:11,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1347920.0, ans=0.1 2024-08-11 23:48:36,331 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.570e+01 2.850e+01 3.397e+01 5.504e+01, threshold=5.701e+01, percent-clipped=0.0 2024-08-11 23:48:42,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1348120.0, ans=0.125 2024-08-11 23:48:47,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1348120.0, ans=0.125 2024-08-11 23:48:48,032 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.39 vs. limit=12.0 2024-08-11 23:48:52,058 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 23:49:07,245 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4400, loss[loss=0.1253, beats_loss=0.009093, ecapa_loss=0.0001641, whisper_loss=0.1146, over 23861.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01105, ecapa_loss=0.0001877, whisper_loss=0.09338, over 3846859.10 frames. ], batch size: 91, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:49:10,245 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 19 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 23:49:11,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1348320.0, ans=0.0 2024-08-11 23:49:11,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1348320.0, ans=0.125 2024-08-11 23:49:19,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1348320.0, ans=0.0 2024-08-11 23:49:27,736 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 23:49:29,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1348420.0, ans=0.1 2024-08-11 23:49:40,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1348520.0, ans=0.1 2024-08-11 23:49:50,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1348620.0, ans=0.05 2024-08-11 23:50:02,725 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 23:50:04,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1348720.0, ans=0.125 2024-08-11 23:50:07,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1348720.0, ans=0.125 2024-08-11 23:50:15,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1348720.0, ans=0.125 2024-08-11 23:50:19,562 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4450, loss[loss=0.1054, beats_loss=0.00986, ecapa_loss=0.0001809, whisper_loss=0.09373, over 19399.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01102, ecapa_loss=0.0001871, whisper_loss=0.09262, over 3826612.15 frames. ], batch size: 76, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:50:29,858 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 23:50:59,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1349020.0, ans=0.0 2024-08-11 23:51:01,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.643e+01 3.000e+01 3.439e+01 5.029e+01, threshold=6.000e+01, percent-clipped=0.0 2024-08-11 23:51:12,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1349120.0, ans=0.2 2024-08-11 23:51:23,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1349220.0, ans=0.2 2024-08-11 23:51:27,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1349220.0, ans=0.125 2024-08-11 23:51:29,185 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-08-11 23:51:29,831 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4500, loss[loss=0.1029, beats_loss=0.01167, ecapa_loss=0.0001832, whisper_loss=0.08937, over 15090.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01099, ecapa_loss=0.0001879, whisper_loss=0.09247, over 3827609.46 frames. ], batch size: 59, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:51:30,005 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 23:51:31,601 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 23:51:45,403 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 23:51:59,456 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 23:52:01,664 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=12.0 2024-08-11 23:52:04,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1349520.0, ans=0.0 2024-08-11 23:52:09,805 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2024-08-11 23:52:23,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1349620.0, ans=0.1 2024-08-11 23:52:37,708 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-11 23:52:38,973 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4550, loss[loss=0.1122, beats_loss=0.01167, ecapa_loss=0.000151, whisper_loss=0.09904, over 20919.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01108, ecapa_loss=0.0001878, whisper_loss=0.09236, over 3841850.74 frames. ], batch size: 82, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:53:09,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1350020.0, ans=0.125 2024-08-11 23:53:19,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.529e+01 2.869e+01 3.379e+01 6.425e+01, threshold=5.739e+01, percent-clipped=1.0 2024-08-11 23:53:21,987 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 23:53:24,676 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 23:53:36,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-08-11 23:53:38,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1350220.0, ans=0.0 2024-08-11 23:53:48,127 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4600, loss[loss=0.09392, beats_loss=0.01267, ecapa_loss=0.0001629, whisper_loss=0.07962, over 23513.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01112, ecapa_loss=0.0001875, whisper_loss=0.09201, over 3859864.43 frames. ], batch size: 92, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:53:55,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1350320.0, ans=15.0 2024-08-11 23:54:00,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1350320.0, ans=0.0 2024-08-11 23:54:03,510 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 23:54:11,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1350420.0, ans=0.1 2024-08-11 23:54:15,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1350420.0, ans=0.0 2024-08-11 23:54:15,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1350420.0, ans=0.125 2024-08-11 23:54:23,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1350520.0, ans=0.125 2024-08-11 23:54:25,807 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.631e+02 2024-08-11 23:54:35,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1350620.0, ans=0.125 2024-08-11 23:54:48,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1350720.0, ans=0.09899494936611666 2024-08-11 23:54:52,871 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 23:55:00,710 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4650, loss[loss=0.1036, beats_loss=0.01251, ecapa_loss=0.0001475, whisper_loss=0.08966, over 23645.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01121, ecapa_loss=0.0001877, whisper_loss=0.09198, over 3877842.99 frames. ], batch size: 91, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:55:32,276 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-11 23:55:33,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1351020.0, ans=0.0 2024-08-11 23:55:40,771 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.63 vs. limit=22.5 2024-08-11 23:55:43,455 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.678e+01 3.059e+01 3.452e+01 5.229e+01, threshold=6.118e+01, percent-clipped=0.0 2024-08-11 23:55:46,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1351120.0, ans=0.125 2024-08-11 23:56:00,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1351220.0, ans=0.2 2024-08-11 23:56:02,569 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 23:56:05,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1351220.0, ans=0.1 2024-08-11 23:56:06,583 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 23:56:13,795 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4700, loss[loss=0.1101, beats_loss=0.01122, ecapa_loss=0.0001724, whisper_loss=0.09718, over 22995.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01111, ecapa_loss=0.0001866, whisper_loss=0.09356, over 3905954.58 frames. ], batch size: 92, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:56:21,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1351320.0, ans=0.125 2024-08-11 23:56:22,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1351320.0, ans=0.2 2024-08-11 23:56:24,420 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-08-11 23:56:37,790 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 23:56:46,162 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=22.5 2024-08-11 23:57:08,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1351620.0, ans=0.125 2024-08-11 23:57:13,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1351720.0, ans=0.1 2024-08-11 23:57:16,895 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2024-08-11 23:57:17,599 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 23:57:27,175 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4750, loss[loss=0.1068, beats_loss=0.01225, ecapa_loss=0.0001443, whisper_loss=0.09308, over 21590.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01113, ecapa_loss=0.0001881, whisper_loss=0.09283, over 3903926.30 frames. ], batch size: 82, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:57:37,891 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 30 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-11 23:57:39,240 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 23:57:59,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1352020.0, ans=0.125 2024-08-11 23:58:01,462 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2024-08-11 23:58:08,080 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 23:58:09,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.685e+01 3.044e+01 3.744e+01 5.202e+01, threshold=6.087e+01, percent-clipped=0.0 2024-08-11 23:58:23,262 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 28 from LS+wenet, 11 from Vox, 19 fro AS 2024-08-11 23:58:36,035 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 23:58:41,673 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4800, loss[loss=0.09785, beats_loss=0.01353, ecapa_loss=0.0001371, whisper_loss=0.08295, over 14290.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01117, ecapa_loss=0.0001886, whisper_loss=0.09301, over 3895351.02 frames. ], batch size: 57, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:58:42,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1352320.0, ans=0.125 2024-08-11 23:58:48,570 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 23:58:48,831 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:58:53,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=1352320.0, ans=22.5 2024-08-11 23:58:56,567 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 16 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 23:59:09,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1352420.0, ans=0.125 2024-08-11 23:59:14,854 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 17 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 23:59:18,535 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 23:59:19,771 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 23:59:37,846 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-11 23:59:47,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1352720.0, ans=0.07 2024-08-11 23:59:57,094 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4850, loss[loss=0.1141, beats_loss=0.01076, ecapa_loss=0.0001795, whisper_loss=0.1016, over 22885.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01117, ecapa_loss=0.0001889, whisper_loss=0.09255, over 3877016.56 frames. ], batch size: 91, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:00:15,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1352920.0, ans=0.2 2024-08-12 00:00:33,960 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 00:00:39,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.736e+01 3.106e+01 3.475e+01 1.081e+02, threshold=6.213e+01, percent-clipped=2.0 2024-08-12 00:00:59,866 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 00:01:04,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1353220.0, ans=0.125 2024-08-12 00:01:07,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1353220.0, ans=0.125 2024-08-12 00:01:10,559 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4900, loss[loss=0.1338, beats_loss=0.007296, ecapa_loss=0.0002066, whisper_loss=0.1244, over 22739.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01109, ecapa_loss=0.0001881, whisper_loss=0.09344, over 3895473.09 frames. ], batch size: 88, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:01:12,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1353320.0, ans=0.2 2024-08-12 00:01:20,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1353320.0, ans=0.125 2024-08-12 00:01:27,943 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 00:01:36,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2024-08-12 00:01:43,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1353520.0, ans=0.0 2024-08-12 00:02:14,325 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 00:02:19,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1353720.0, ans=0.0 2024-08-12 00:02:24,428 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 4950, loss[loss=0.09282, beats_loss=0.01153, ecapa_loss=0.0001841, whisper_loss=0.07945, over 21685.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01113, ecapa_loss=0.0001865, whisper_loss=0.09287, over 3888290.33 frames. ], batch size: 91, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:02:35,782 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 00:02:36,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1353820.0, ans=0.2 2024-08-12 00:02:44,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1353920.0, ans=0.125 2024-08-12 00:02:56,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1354020.0, ans=0.125 2024-08-12 00:03:07,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.588e+01 2.867e+01 3.231e+01 4.752e+01, threshold=5.733e+01, percent-clipped=0.0 2024-08-12 00:03:26,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1354220.0, ans=0.2 2024-08-12 00:03:27,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1354220.0, ans=0.2 2024-08-12 00:03:37,297 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5000, loss[loss=0.1229, beats_loss=0.0107, ecapa_loss=0.0001873, whisper_loss=0.1103, over 23042.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0111, ecapa_loss=0.0001863, whisper_loss=0.09341, over 3891947.72 frames. ], batch size: 90, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:03:42,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1354320.0, ans=0.125 2024-08-12 00:03:50,968 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-12 00:04:06,583 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 00:04:14,397 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=15.0 2024-08-12 00:04:22,440 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-08-12 00:04:48,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1354820.0, ans=0.0 2024-08-12 00:04:48,909 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5050, loss[loss=0.1123, beats_loss=0.00952, ecapa_loss=0.000204, whisper_loss=0.1007, over 22518.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01115, ecapa_loss=0.0001864, whisper_loss=0.09332, over 3891223.62 frames. ], batch size: 89, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:04:51,767 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 00:05:03,266 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 29 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 00:05:05,709 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 00:05:20,132 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 00:05:29,966 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.677e+01 3.041e+01 3.640e+01 6.697e+01, threshold=6.081e+01, percent-clipped=3.0 2024-08-12 00:05:41,965 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 00:05:42,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1355120.0, ans=0.07 2024-08-12 00:05:45,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1355220.0, ans=0.125 2024-08-12 00:05:47,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1355220.0, ans=0.125 2024-08-12 00:05:50,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1355220.0, ans=0.125 2024-08-12 00:06:00,469 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5100, loss[loss=0.06767, beats_loss=0.01335, ecapa_loss=0.0002002, whisper_loss=0.05232, over 18325.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01114, ecapa_loss=0.0001863, whisper_loss=0.0933, over 3902064.30 frames. ], batch size: 79, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:06:03,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1355320.0, ans=0.125 2024-08-12 00:06:18,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.85 vs. limit=10.0 2024-08-12 00:06:29,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1355520.0, ans=0.025 2024-08-12 00:06:35,775 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:06:46,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1355620.0, ans=0.125 2024-08-12 00:06:50,589 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 00:06:53,520 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 00:07:09,935 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5150, loss[loss=0.1423, beats_loss=0.007322, ecapa_loss=0.0001926, whisper_loss=0.1331, over 23867.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01123, ecapa_loss=0.0001841, whisper_loss=0.09306, over 3902793.32 frames. ], batch size: 92, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:07:18,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1355820.0, ans=0.2 2024-08-12 00:07:20,355 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:07:43,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1356020.0, ans=0.0 2024-08-12 00:07:50,897 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.562e+01 2.961e+01 3.572e+01 5.621e+01, threshold=5.922e+01, percent-clipped=0.0 2024-08-12 00:07:51,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1356120.0, ans=0.125 2024-08-12 00:07:57,489 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.59 vs. limit=10.0 2024-08-12 00:08:01,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1356120.0, ans=0.0 2024-08-12 00:08:05,245 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 00:08:07,685 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-12 00:08:08,363 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 00:08:15,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1356220.0, ans=0.0 2024-08-12 00:08:19,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1356320.0, ans=0.125 2024-08-12 00:08:20,616 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5200, loss[loss=0.1105, beats_loss=0.01096, ecapa_loss=0.0001303, whisper_loss=0.09825, over 18338.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01117, ecapa_loss=0.0001839, whisper_loss=0.09282, over 3877208.77 frames. ], batch size: 68, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:08:23,771 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-12 00:08:25,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1356320.0, ans=0.125 2024-08-12 00:08:29,441 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-12 00:08:39,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1356420.0, ans=0.125 2024-08-12 00:08:42,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1356420.0, ans=0.2 2024-08-12 00:08:48,972 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 00:09:03,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1356620.0, ans=0.125 2024-08-12 00:09:07,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2024-08-12 00:09:14,291 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 25 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 00:09:18,266 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 19 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 00:09:30,869 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5250, loss[loss=0.09703, beats_loss=0.01133, ecapa_loss=0.000172, whisper_loss=0.08398, over 15907.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01113, ecapa_loss=0.0001843, whisper_loss=0.09268, over 3875308.56 frames. ], batch size: 64, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:09:31,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1356820.0, ans=0.125 2024-08-12 00:09:31,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1356820.0, ans=0.2 2024-08-12 00:09:39,622 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 00:09:39,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1356820.0, ans=0.0 2024-08-12 00:10:08,841 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 00:10:11,253 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.537e+01 2.858e+01 3.258e+01 4.916e+01, threshold=5.717e+01, percent-clipped=0.0 2024-08-12 00:10:33,083 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=10.0 2024-08-12 00:10:33,897 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 00:10:39,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1357320.0, ans=0.0 2024-08-12 00:10:40,569 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5300, loss[loss=0.109, beats_loss=0.01218, ecapa_loss=0.0001748, whisper_loss=0.09511, over 14862.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01111, ecapa_loss=0.0001861, whisper_loss=0.09274, over 3877097.10 frames. ], batch size: 59, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:10:43,635 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-12 00:11:14,499 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 00:11:20,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1357520.0, ans=0.1 2024-08-12 00:11:24,201 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 00:11:42,172 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.46 vs. limit=15.0 2024-08-12 00:11:48,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1357820.0, ans=0.0 2024-08-12 00:11:49,241 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5350, loss[loss=0.1001, beats_loss=0.01345, ecapa_loss=0.0001919, whisper_loss=0.08477, over 19190.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01109, ecapa_loss=0.0001864, whisper_loss=0.09287, over 3874414.25 frames. ], batch size: 79, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:11:49,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1357820.0, ans=0.0 2024-08-12 00:12:09,823 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 00:12:16,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2024-08-12 00:12:26,943 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:12:30,556 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.572e+01 2.816e+01 3.245e+01 5.813e+01, threshold=5.633e+01, percent-clipped=1.0 2024-08-12 00:12:53,228 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 00:13:01,868 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5400, loss[loss=0.112, beats_loss=0.01148, ecapa_loss=0.0001489, whisper_loss=0.09905, over 17332.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0111, ecapa_loss=0.0001846, whisper_loss=0.09326, over 3847679.02 frames. ], batch size: 67, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:13:09,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1358320.0, ans=0.0 2024-08-12 00:13:13,027 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 00:13:15,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1358420.0, ans=0.2 2024-08-12 00:13:17,088 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-12 00:13:25,282 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 28 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 00:13:30,281 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 00:13:37,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1358520.0, ans=0.1 2024-08-12 00:13:53,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1358620.0, ans=0.2 2024-08-12 00:14:03,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1358720.0, ans=0.2 2024-08-12 00:14:18,643 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5450, loss[loss=0.1226, beats_loss=0.01188, ecapa_loss=0.0002213, whisper_loss=0.1085, over 22665.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01119, ecapa_loss=0.0001833, whisper_loss=0.093, over 3839909.85 frames. ], batch size: 95, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:14:27,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1358820.0, ans=0.1 2024-08-12 00:14:34,165 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-12 00:14:36,281 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 00:14:40,514 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 00:14:43,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1358920.0, ans=0.125 2024-08-12 00:14:53,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1359020.0, ans=0.125 2024-08-12 00:14:53,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1359020.0, ans=0.1 2024-08-12 00:15:02,596 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 00:15:05,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.617e+01 2.957e+01 3.359e+01 7.305e+01, threshold=5.914e+01, percent-clipped=2.0 2024-08-12 00:15:18,863 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.08 vs. limit=15.0 2024-08-12 00:15:46,595 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5500, loss[loss=0.09532, beats_loss=0.01387, ecapa_loss=0.0001542, whisper_loss=0.07991, over 16930.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01118, ecapa_loss=0.0001826, whisper_loss=0.09326, over 3868243.62 frames. ], batch size: 66, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:16:41,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1359520.0, ans=0.1 2024-08-12 00:16:42,031 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2024-08-12 00:16:43,358 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2024-08-12 00:16:50,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1359620.0, ans=0.07 2024-08-12 00:16:50,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1359620.0, ans=0.125 2024-08-12 00:16:51,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1359620.0, ans=0.125 2024-08-12 00:16:59,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1359620.0, ans=0.1 2024-08-12 00:17:00,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1359720.0, ans=0.125 2024-08-12 00:17:03,222 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 00:17:03,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1359720.0, ans=0.5 2024-08-12 00:17:10,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1359720.0, ans=0.125 2024-08-12 00:17:10,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1359720.0, ans=0.1 2024-08-12 00:17:11,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1359720.0, ans=0.125 2024-08-12 00:17:19,718 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5550, loss[loss=0.1031, beats_loss=0.00988, ecapa_loss=0.0001752, whisper_loss=0.09152, over 22650.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01125, ecapa_loss=0.0001826, whisper_loss=0.09303, over 3951646.87 frames. ], batch size: 90, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:17:21,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1359820.0, ans=0.125 2024-08-12 00:17:27,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1359820.0, ans=0.09899494936611666 2024-08-12 00:17:35,198 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=12.0 2024-08-12 00:17:41,970 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:18:14,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.662e+01 3.000e+01 3.511e+01 5.450e+01, threshold=6.001e+01, percent-clipped=0.0 2024-08-12 00:18:15,250 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 00:18:29,196 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-08-12 00:18:30,760 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.130e-02 2024-08-12 00:18:31,994 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 00:18:34,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1360220.0, ans=0.125 2024-08-12 00:18:53,350 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5600, loss[loss=0.1098, beats_loss=0.009494, ecapa_loss=0.0001887, whisper_loss=0.09841, over 15403.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01126, ecapa_loss=0.0001819, whisper_loss=0.09314, over 3953826.10 frames. ], batch size: 58, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:18:54,096 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2024-08-12 00:18:55,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1360320.0, ans=0.0 2024-08-12 00:19:03,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1360320.0, ans=0.0 2024-08-12 00:19:15,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1360420.0, ans=0.1 2024-08-12 00:19:25,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1360420.0, ans=0.0 2024-08-12 00:19:33,799 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-12 00:19:47,450 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.38 vs. limit=10.0 2024-08-12 00:20:07,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1360720.0, ans=0.125 2024-08-12 00:20:15,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1360720.0, ans=0.125 2024-08-12 00:20:23,096 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=15.0 2024-08-12 00:20:24,787 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5650, loss[loss=0.08306, beats_loss=0.01166, ecapa_loss=0.0001767, whisper_loss=0.06963, over 15135.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0113, ecapa_loss=0.0001833, whisper_loss=0.09227, over 3953703.05 frames. ], batch size: 60, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:20:34,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1360820.0, ans=0.1 2024-08-12 00:20:37,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1360920.0, ans=0.125 2024-08-12 00:20:42,633 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:20:50,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1361020.0, ans=0.2 2024-08-12 00:21:02,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1361020.0, ans=0.125 2024-08-12 00:21:04,054 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.079e+01 2.708e+01 3.179e+01 3.775e+01 1.197e+02, threshold=6.358e+01, percent-clipped=2.0 2024-08-12 00:21:17,761 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 00:21:30,426 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 00:21:32,902 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5700, loss[loss=0.1266, beats_loss=0.008071, ecapa_loss=0.0002141, whisper_loss=0.1164, over 16395.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0112, ecapa_loss=0.0001846, whisper_loss=0.09317, over 3945202.69 frames. ], batch size: 64, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:21:34,344 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 28 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-12 00:21:42,609 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 00:21:45,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1361420.0, ans=0.0 2024-08-12 00:21:47,333 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-12 00:21:50,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.55 vs. limit=5.0 2024-08-12 00:21:52,255 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-12 00:22:03,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1361520.0, ans=0.0 2024-08-12 00:22:07,470 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.26 vs. limit=22.5 2024-08-12 00:22:30,856 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 00:22:36,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1361720.0, ans=0.125 2024-08-12 00:22:40,219 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5750, loss[loss=0.1059, beats_loss=0.008769, ecapa_loss=0.0001693, whisper_loss=0.09542, over 14771.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01123, ecapa_loss=0.0001856, whisper_loss=0.09351, over 3948886.72 frames. ], batch size: 54, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:23:05,206 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 00:23:16,792 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2024-08-12 00:23:19,127 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 29 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-12 00:23:19,537 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-12 00:23:20,138 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.574e+01 2.789e+01 3.089e+01 4.490e+01, threshold=5.577e+01, percent-clipped=0.0 2024-08-12 00:23:26,867 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2024-08-12 00:23:49,611 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5800, loss[loss=0.09236, beats_loss=0.01368, ecapa_loss=0.0001609, whisper_loss=0.07707, over 21323.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01117, ecapa_loss=0.0001853, whisper_loss=0.09347, over 3922422.25 frames. ], batch size: 89, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:23:50,119 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:24:10,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1362420.0, ans=0.1 2024-08-12 00:24:16,490 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2024-08-12 00:24:22,550 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 17 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 00:24:38,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1362620.0, ans=0.0 2024-08-12 00:24:58,125 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5850, loss[loss=0.1008, beats_loss=0.007654, ecapa_loss=0.000245, whisper_loss=0.09066, over 15113.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01124, ecapa_loss=0.0001837, whisper_loss=0.09278, over 3918582.22 frames. ], batch size: 63, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:25:00,994 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-12 00:25:18,783 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 00:25:27,133 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.80 vs. limit=6.0 2024-08-12 00:25:32,327 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 00:25:37,561 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.515e+01 2.804e+01 3.095e+01 4.578e+01, threshold=5.608e+01, percent-clipped=0.0 2024-08-12 00:25:47,364 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 00:25:48,673 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 11 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-12 00:26:03,351 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.12 vs. limit=22.5 2024-08-12 00:26:06,434 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5900, loss[loss=0.09326, beats_loss=0.01166, ecapa_loss=0.0002078, whisper_loss=0.07952, over 21396.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01132, ecapa_loss=0.0001844, whisper_loss=0.09163, over 3874587.96 frames. ], batch size: 90, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:26:15,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1363320.0, ans=0.0 2024-08-12 00:26:16,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1363320.0, ans=0.125 2024-08-12 00:26:30,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1363420.0, ans=0.1 2024-08-12 00:26:34,722 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-12 00:26:38,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1363520.0, ans=0.1 2024-08-12 00:26:44,988 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 00:26:52,938 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 13 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 00:27:01,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1363720.0, ans=0.1 2024-08-12 00:27:07,780 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 00:27:11,679 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 00:27:14,343 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 5950, loss[loss=0.08724, beats_loss=0.01161, ecapa_loss=0.0001813, whisper_loss=0.07381, over 18376.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01137, ecapa_loss=0.0001835, whisper_loss=0.0919, over 3893663.79 frames. ], batch size: 75, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:27:20,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1363820.0, ans=0.125 2024-08-12 00:27:24,251 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 32 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 00:27:36,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1363920.0, ans=0.05 2024-08-12 00:27:39,995 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 24 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 00:27:53,315 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.619e+01 2.853e+01 3.292e+01 6.548e+01, threshold=5.706e+01, percent-clipped=1.0 2024-08-12 00:28:06,683 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.184e-01 2024-08-12 00:28:08,206 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2024-08-12 00:28:10,287 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 00:28:17,249 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 00:28:18,550 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 34 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 00:28:18,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1364220.0, ans=0.0 2024-08-12 00:28:22,356 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6000, loss[loss=0.1006, beats_loss=0.01376, ecapa_loss=0.0001588, whisper_loss=0.08529, over 23452.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01134, ecapa_loss=0.0001832, whisper_loss=0.09201, over 3908937.84 frames. ], batch size: 94, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:28:22,356 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 00:29:04,116 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on ASR_libri: loss=0.2569, beats_loss=0, ecapa_loss=0.0006172, whisper_loss=0.2508, over 922467.00 frames. 2024-08-12 00:29:22,543 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on SV_voxceleb1: loss=0.005036, beats_loss=0, ecapa_loss=0.0005036, whisper_loss=0, over 939242.00 frames. 2024-08-12 00:31:25,886 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on AT_audioset: loss=0.02463, beats_loss=0.02463, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 00:31:25,890 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 00:31:26,031 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 00:32:04,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1364520.0, ans=0.05 2024-08-12 00:32:10,316 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=12.0 2024-08-12 00:32:11,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1364620.0, ans=0.0 2024-08-12 00:32:13,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1364620.0, ans=0.1 2024-08-12 00:32:28,412 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.15 vs. limit=22.5 2024-08-12 00:32:33,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1364820.0, ans=0.2 2024-08-12 00:32:34,415 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6050, loss[loss=0.1078, beats_loss=0.01084, ecapa_loss=0.0001771, whisper_loss=0.09521, over 22892.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01129, ecapa_loss=0.0001833, whisper_loss=0.09269, over 3894913.80 frames. ], batch size: 89, lr: 6.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:32:37,874 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 00:32:49,808 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 29 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 00:33:13,419 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 00:33:13,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1365020.0, ans=0.125 2024-08-12 00:33:16,277 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.639e+01 2.972e+01 3.364e+01 6.267e+01, threshold=5.943e+01, percent-clipped=1.0 2024-08-12 00:33:22,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1365120.0, ans=0.0 2024-08-12 00:33:44,106 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6100, loss[loss=0.1158, beats_loss=0.009477, ecapa_loss=0.0002051, whisper_loss=0.1043, over 20577.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01124, ecapa_loss=0.0001836, whisper_loss=0.09324, over 3883675.72 frames. ], batch size: 82, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:34:21,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1365520.0, ans=0.125 2024-08-12 00:34:24,260 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-12 00:34:28,799 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-08-12 00:34:34,370 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2024-08-12 00:34:36,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1365620.0, ans=0.0 2024-08-12 00:34:51,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1365720.0, ans=0.1 2024-08-12 00:34:54,841 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6150, loss[loss=0.1063, beats_loss=0.01028, ecapa_loss=0.000239, whisper_loss=0.09368, over 21021.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01126, ecapa_loss=0.0001835, whisper_loss=0.09295, over 3892266.83 frames. ], batch size: 89, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:34:55,012 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 00:34:56,450 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 00:34:56,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-08-12 00:35:04,539 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 00:35:10,158 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 00:35:13,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1365920.0, ans=0.2 2024-08-12 00:35:27,712 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 00:35:29,945 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2024-08-12 00:35:36,005 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.497e+01 2.771e+01 3.038e+01 4.710e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-12 00:35:44,457 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-12 00:35:54,562 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2024-08-12 00:36:03,273 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6200, loss[loss=0.09093, beats_loss=0.01494, ecapa_loss=0.0001521, whisper_loss=0.07447, over 20879.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01132, ecapa_loss=0.0001836, whisper_loss=0.09199, over 3857017.25 frames. ], batch size: 86, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:36:03,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1366320.0, ans=0.0 2024-08-12 00:36:05,658 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-12 00:36:23,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1366420.0, ans=0.125 2024-08-12 00:36:23,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1366420.0, ans=0.0 2024-08-12 00:36:30,027 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 00:36:46,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1366620.0, ans=0.125 2024-08-12 00:36:47,827 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 12 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 00:37:00,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1366720.0, ans=0.125 2024-08-12 00:37:04,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1366720.0, ans=0.125 2024-08-12 00:37:09,883 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 00:37:11,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1366820.0, ans=0.1 2024-08-12 00:37:12,595 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6250, loss[loss=0.1053, beats_loss=0.01137, ecapa_loss=0.0001795, whisper_loss=0.09218, over 22488.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01129, ecapa_loss=0.0001835, whisper_loss=0.09223, over 3867678.32 frames. ], batch size: 92, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:37:13,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1366820.0, ans=0.0 2024-08-12 00:37:21,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1366820.0, ans=0.2 2024-08-12 00:37:22,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1366820.0, ans=0.0 2024-08-12 00:37:23,879 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 00:37:48,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1367020.0, ans=0.0 2024-08-12 00:37:53,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.633e+01 2.869e+01 3.281e+01 7.272e+01, threshold=5.739e+01, percent-clipped=3.0 2024-08-12 00:37:54,474 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=12.0 2024-08-12 00:37:58,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1367120.0, ans=0.1 2024-08-12 00:38:18,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1367220.0, ans=0.2 2024-08-12 00:38:21,931 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6300, loss[loss=0.09753, beats_loss=0.01257, ecapa_loss=0.0002017, whisper_loss=0.08294, over 21427.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01129, ecapa_loss=0.0001841, whisper_loss=0.09155, over 3868310.11 frames. ], batch size: 88, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:38:28,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.50 vs. limit=10.0 2024-08-12 00:38:30,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1367320.0, ans=0.125 2024-08-12 00:38:46,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1367420.0, ans=0.0 2024-08-12 00:38:53,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1367520.0, ans=0.125 2024-08-12 00:39:03,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1367620.0, ans=0.2 2024-08-12 00:39:14,202 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 19 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 00:39:17,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1367720.0, ans=0.125 2024-08-12 00:39:23,229 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=12.0 2024-08-12 00:39:30,847 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6350, loss[loss=0.09139, beats_loss=0.01499, ecapa_loss=0.0001617, whisper_loss=0.07478, over 17012.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01134, ecapa_loss=0.0001856, whisper_loss=0.09115, over 3863552.39 frames. ], batch size: 70, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:39:35,494 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 19 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-12 00:39:39,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-12 00:39:41,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1367820.0, ans=0.0 2024-08-12 00:39:54,333 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 00:40:06,171 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 00:40:12,486 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.594e+01 2.991e+01 3.551e+01 3.558e+02, threshold=5.982e+01, percent-clipped=1.0 2024-08-12 00:40:22,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1368120.0, ans=0.2 2024-08-12 00:40:36,061 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 34 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 00:40:36,706 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-12 00:40:40,088 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6400, loss[loss=0.1088, beats_loss=0.01285, ecapa_loss=0.0001897, whisper_loss=0.09405, over 23383.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01122, ecapa_loss=0.0001854, whisper_loss=0.09159, over 3855379.52 frames. ], batch size: 95, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:40:43,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1368320.0, ans=0.05 2024-08-12 00:40:44,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1368320.0, ans=0.125 2024-08-12 00:41:06,454 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2024-08-12 00:41:39,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1368720.0, ans=0.1 2024-08-12 00:41:43,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1368720.0, ans=10.0 2024-08-12 00:41:49,054 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6450, loss[loss=0.1148, beats_loss=0.01066, ecapa_loss=0.0002047, whisper_loss=0.1021, over 19232.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01126, ecapa_loss=0.0001857, whisper_loss=0.09175, over 3864367.74 frames. ], batch size: 80, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:41:52,068 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 00:41:58,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-08-12 00:42:02,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1368920.0, ans=0.125 2024-08-12 00:42:14,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1368920.0, ans=0.0 2024-08-12 00:42:15,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1369020.0, ans=0.07 2024-08-12 00:42:21,155 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:42:27,030 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2024-08-12 00:42:30,094 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.638e+01 2.996e+01 3.413e+01 4.809e+01, threshold=5.992e+01, percent-clipped=1.0 2024-08-12 00:42:32,506 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2024-08-12 00:42:47,214 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-12 00:42:58,169 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6500, loss[loss=0.09148, beats_loss=0.01261, ecapa_loss=0.0001793, whisper_loss=0.07707, over 22856.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01125, ecapa_loss=0.0001848, whisper_loss=0.09191, over 3870057.26 frames. ], batch size: 92, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:42:58,509 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 00:42:59,646 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 00:43:03,759 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 00:43:07,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1369320.0, ans=0.125 2024-08-12 00:43:14,579 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2024-08-12 00:43:16,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1369420.0, ans=0.125 2024-08-12 00:43:19,891 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-12 00:43:20,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1369420.0, ans=0.2 2024-08-12 00:43:38,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1369620.0, ans=0.0 2024-08-12 00:44:00,852 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.514e-01 2024-08-12 00:44:07,000 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6550, loss[loss=0.1219, beats_loss=0.009892, ecapa_loss=0.000172, whisper_loss=0.1103, over 17786.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01122, ecapa_loss=0.0001841, whisper_loss=0.09219, over 3899500.39 frames. ], batch size: 68, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:44:10,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1369820.0, ans=0.0 2024-08-12 00:44:13,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1369820.0, ans=0.1 2024-08-12 00:44:23,706 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 00:44:29,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1369920.0, ans=0.125 2024-08-12 00:44:32,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1369920.0, ans=0.0 2024-08-12 00:44:32,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1369920.0, ans=0.05 2024-08-12 00:44:40,417 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 39 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 00:44:42,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1370020.0, ans=0.07 2024-08-12 00:44:48,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.662e+01 3.000e+01 3.439e+01 5.833e+01, threshold=5.999e+01, percent-clipped=0.0 2024-08-12 00:45:12,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1370220.0, ans=0.1 2024-08-12 00:45:14,990 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 20 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 00:45:16,046 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6600, loss[loss=0.08669, beats_loss=0.01263, ecapa_loss=0.0002168, whisper_loss=0.07189, over 20821.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01123, ecapa_loss=0.0001845, whisper_loss=0.09229, over 3931925.42 frames. ], batch size: 87, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:45:24,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1370320.0, ans=0.125 2024-08-12 00:45:26,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1370320.0, ans=0.2 2024-08-12 00:45:28,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1370420.0, ans=0.0 2024-08-12 00:45:37,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1370420.0, ans=0.2 2024-08-12 00:45:50,790 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 32 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 00:46:03,460 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:46:04,536 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 00:46:10,846 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-08-12 00:46:18,689 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-12 00:46:19,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1370720.0, ans=0.125 2024-08-12 00:46:25,038 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6650, loss[loss=0.09564, beats_loss=0.01348, ecapa_loss=0.0001826, whisper_loss=0.08034, over 18909.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01118, ecapa_loss=0.000184, whisper_loss=0.09298, over 3958532.39 frames. ], batch size: 77, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:46:37,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1370820.0, ans=0.1 2024-08-12 00:46:38,445 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 00:46:38,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1370920.0, ans=0.1 2024-08-12 00:46:41,123 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-12 00:46:46,563 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 00:46:46,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1370920.0, ans=0.05 2024-08-12 00:46:51,801 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 32 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 00:47:06,639 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.593e+01 2.812e+01 3.124e+01 4.169e+01, threshold=5.623e+01, percent-clipped=0.0 2024-08-12 00:47:07,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1371120.0, ans=0.1 2024-08-12 00:47:12,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1371120.0, ans=0.125 2024-08-12 00:47:32,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1371220.0, ans=0.125 2024-08-12 00:47:34,526 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6700, loss[loss=0.1149, beats_loss=0.009048, ecapa_loss=0.0001701, whisper_loss=0.1042, over 18340.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01115, ecapa_loss=0.000184, whisper_loss=0.09328, over 3942931.31 frames. ], batch size: 69, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:47:41,207 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2024-08-12 00:47:53,936 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 00:47:54,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1371420.0, ans=0.125 2024-08-12 00:47:56,034 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2024-08-12 00:47:58,344 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-12 00:48:00,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1371420.0, ans=0.125 2024-08-12 00:48:14,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1371520.0, ans=0.0 2024-08-12 00:48:44,874 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6750, loss[loss=0.1174, beats_loss=0.01111, ecapa_loss=0.0002097, whisper_loss=0.1042, over 20979.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01118, ecapa_loss=0.0001844, whisper_loss=0.09326, over 3921733.52 frames. ], batch size: 88, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:49:03,543 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2024-08-12 00:49:03,779 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-12 00:49:04,253 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-12 00:49:17,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1372020.0, ans=0.125 2024-08-12 00:49:22,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1372020.0, ans=0.0 2024-08-12 00:49:26,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.541e+01 2.925e+01 3.464e+01 4.634e+01, threshold=5.851e+01, percent-clipped=0.0 2024-08-12 00:49:34,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1372120.0, ans=0.125 2024-08-12 00:49:43,458 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 00:49:45,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1372220.0, ans=0.2 2024-08-12 00:49:54,332 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6800, loss[loss=0.1178, beats_loss=0.01228, ecapa_loss=0.0001876, whisper_loss=0.1036, over 21492.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01115, ecapa_loss=0.0001862, whisper_loss=0.09297, over 3907783.52 frames. ], batch size: 86, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:50:08,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1372420.0, ans=0.2 2024-08-12 00:50:10,696 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.97 vs. limit=15.0 2024-08-12 00:50:15,502 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 00:50:19,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1372420.0, ans=0.125 2024-08-12 00:50:26,513 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 00:50:29,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1372520.0, ans=0.125 2024-08-12 00:50:31,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1372520.0, ans=0.125 2024-08-12 00:50:37,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1372620.0, ans=0.125 2024-08-12 00:51:03,547 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6850, loss[loss=0.115, beats_loss=0.01004, ecapa_loss=0.0001868, whisper_loss=0.1031, over 22848.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01112, ecapa_loss=0.0001863, whisper_loss=0.09286, over 3888912.94 frames. ], batch size: 93, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:51:09,490 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 00:51:10,820 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 22 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-12 00:51:12,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1372820.0, ans=0.5 2024-08-12 00:51:18,828 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-12 00:51:21,249 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 26 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-12 00:51:39,184 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 00:51:39,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1373020.0, ans=0.0 2024-08-12 00:51:44,567 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.602e+01 2.969e+01 3.307e+01 6.186e+01, threshold=5.938e+01, percent-clipped=1.0 2024-08-12 00:52:12,484 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6900, loss[loss=0.1084, beats_loss=0.01016, ecapa_loss=0.0002073, whisper_loss=0.09614, over 21756.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01112, ecapa_loss=0.0001859, whisper_loss=0.09265, over 3896379.36 frames. ], batch size: 89, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:52:13,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=12.0 2024-08-12 00:52:20,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1373320.0, ans=0.0 2024-08-12 00:52:24,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1373320.0, ans=0.1 2024-08-12 00:52:29,911 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 00:52:37,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1373420.0, ans=0.1 2024-08-12 00:52:39,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1373420.0, ans=0.125 2024-08-12 00:52:42,833 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 00:53:02,960 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 00:53:06,981 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-12 00:53:12,731 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:53:22,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1373820.0, ans=0.1 2024-08-12 00:53:23,559 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 6950, loss[loss=0.1236, beats_loss=0.01046, ecapa_loss=0.0001699, whisper_loss=0.1114, over 22760.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01124, ecapa_loss=0.000184, whisper_loss=0.09252, over 3900604.22 frames. ], batch size: 89, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:53:39,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1373920.0, ans=0.125 2024-08-12 00:54:00,306 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 00:54:04,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1374120.0, ans=0.125 2024-08-12 00:54:05,815 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.522e+01 2.749e+01 3.045e+01 4.953e+01, threshold=5.497e+01, percent-clipped=0.0 2024-08-12 00:54:13,273 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 16 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-12 00:54:33,979 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7000, loss[loss=0.09237, beats_loss=0.01324, ecapa_loss=0.000206, whisper_loss=0.07707, over 20645.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01123, ecapa_loss=0.0001848, whisper_loss=0.09214, over 3878222.41 frames. ], batch size: 86, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:54:34,757 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=12.0 2024-08-12 00:54:46,558 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 00:54:50,859 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 00:54:55,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1374420.0, ans=0.5 2024-08-12 00:54:58,964 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 30 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 00:55:19,540 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 00:55:22,819 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2024-08-12 00:55:24,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-12 00:55:33,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=1374720.0, ans=12.0 2024-08-12 00:55:41,948 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7050, loss[loss=0.1221, beats_loss=0.01063, ecapa_loss=0.0002021, whisper_loss=0.1094, over 14383.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01118, ecapa_loss=0.0001844, whisper_loss=0.09321, over 3907707.20 frames. ], batch size: 58, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:55:49,317 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:55:50,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1374820.0, ans=0.125 2024-08-12 00:55:51,083 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2024-08-12 00:56:06,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1374920.0, ans=0.125 2024-08-12 00:56:16,284 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 00:56:23,081 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.564e+01 2.939e+01 3.594e+01 1.844e+02, threshold=5.878e+01, percent-clipped=7.0 2024-08-12 00:56:23,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1375120.0, ans=0.125 2024-08-12 00:56:44,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1375220.0, ans=0.125 2024-08-12 00:56:50,737 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7100, loss[loss=0.1251, beats_loss=0.009064, ecapa_loss=0.0001926, whisper_loss=0.1142, over 23075.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01116, ecapa_loss=0.0001836, whisper_loss=0.09326, over 3907083.58 frames. ], batch size: 89, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:57:14,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1375420.0, ans=0.1 2024-08-12 00:57:40,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1375620.0, ans=0.125 2024-08-12 00:57:43,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1375620.0, ans=0.125 2024-08-12 00:57:59,834 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7150, loss[loss=0.1174, beats_loss=0.009826, ecapa_loss=0.0001792, whisper_loss=0.1057, over 23279.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0112, ecapa_loss=0.0001831, whisper_loss=0.09309, over 3932797.68 frames. ], batch size: 91, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:58:09,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1375820.0, ans=0.125 2024-08-12 00:58:20,582 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 00:58:26,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1376020.0, ans=0.04949747468305833 2024-08-12 00:58:42,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.592e+01 2.864e+01 3.293e+01 5.608e+01, threshold=5.729e+01, percent-clipped=0.0 2024-08-12 00:59:02,857 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 00:59:09,147 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7200, loss[loss=0.1019, beats_loss=0.01117, ecapa_loss=0.0001436, whisper_loss=0.08931, over 20357.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01119, ecapa_loss=0.0001831, whisper_loss=0.09348, over 3950580.90 frames. ], batch size: 77, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:59:36,149 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-08-12 00:59:42,991 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 00:59:48,523 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 00:59:49,788 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 01:00:17,922 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7250, loss[loss=0.117, beats_loss=0.01022, ecapa_loss=0.000175, whisper_loss=0.105, over 22390.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01106, ecapa_loss=0.0001851, whisper_loss=0.09383, over 3951236.40 frames. ], batch size: 89, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:00:32,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-08-12 01:00:41,370 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 01:00:46,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1377020.0, ans=0.125 2024-08-12 01:00:56,844 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-12 01:00:59,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.509e+01 2.818e+01 3.163e+01 4.594e+01, threshold=5.637e+01, percent-clipped=0.0 2024-08-12 01:01:16,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1377220.0, ans=0.125 2024-08-12 01:01:27,385 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7300, loss[loss=0.1323, beats_loss=0.009264, ecapa_loss=0.0002003, whisper_loss=0.121, over 22159.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01104, ecapa_loss=0.0001854, whisper_loss=0.09401, over 3919862.28 frames. ], batch size: 89, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:01:52,671 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 01:01:56,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1377520.0, ans=0.125 2024-08-12 01:02:02,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=22.5 2024-08-12 01:02:30,358 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 01:02:32,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1377720.0, ans=0.0 2024-08-12 01:02:33,335 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 01:02:37,286 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7350, loss[loss=0.09281, beats_loss=0.01208, ecapa_loss=0.0001627, whisper_loss=0.07911, over 20040.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01105, ecapa_loss=0.0001866, whisper_loss=0.09409, over 3918922.37 frames. ], batch size: 78, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:02:42,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=1377820.0, ans=0.02 2024-08-12 01:02:43,973 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 01:03:00,557 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 01:03:09,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1378020.0, ans=0.05 2024-08-12 01:03:15,095 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 01:03:18,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.545e+01 2.938e+01 3.274e+01 5.414e+01, threshold=5.876e+01, percent-clipped=0.0 2024-08-12 01:03:34,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1378220.0, ans=0.0 2024-08-12 01:03:46,279 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7400, loss[loss=0.07653, beats_loss=0.0123, ecapa_loss=0.0002154, whisper_loss=0.06208, over 15636.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01116, ecapa_loss=0.0001852, whisper_loss=0.09336, over 3917255.57 frames. ], batch size: 67, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:04:19,086 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2024-08-12 01:04:20,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1378520.0, ans=0.1 2024-08-12 01:04:20,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1378520.0, ans=0.125 2024-08-12 01:04:21,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1378520.0, ans=0.2 2024-08-12 01:04:30,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1378620.0, ans=0.125 2024-08-12 01:04:47,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1378720.0, ans=0.2 2024-08-12 01:04:54,970 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7450, loss[loss=0.09035, beats_loss=0.008637, ecapa_loss=0.0001643, whisper_loss=0.08007, over 16099.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01121, ecapa_loss=0.000186, whisper_loss=0.09281, over 3904829.20 frames. ], batch size: 58, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:04:55,349 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 01:05:04,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1378820.0, ans=0.0 2024-08-12 01:05:14,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1378920.0, ans=0.125 2024-08-12 01:05:14,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1378920.0, ans=0.125 2024-08-12 01:05:29,186 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 01:05:36,040 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.504e+01 2.763e+01 3.240e+01 5.325e+01, threshold=5.527e+01, percent-clipped=0.0 2024-08-12 01:05:47,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1379120.0, ans=0.125 2024-08-12 01:05:50,520 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 01:05:55,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1379220.0, ans=0.1 2024-08-12 01:05:58,294 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.83 vs. limit=15.0 2024-08-12 01:05:59,027 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 01:06:00,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=1379220.0, ans=0.02 2024-08-12 01:06:04,695 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7500, loss[loss=0.1069, beats_loss=0.0106, ecapa_loss=0.0002152, whisper_loss=0.09419, over 21735.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01117, ecapa_loss=0.0001843, whisper_loss=0.09339, over 3889444.02 frames. ], batch size: 91, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:06:05,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1379320.0, ans=0.125 2024-08-12 01:06:18,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1379420.0, ans=0.125 2024-08-12 01:06:22,551 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-12 01:06:26,671 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 01:06:28,412 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 01:06:48,705 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 01:06:55,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1379620.0, ans=0.125 2024-08-12 01:07:10,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1379720.0, ans=0.1 2024-08-12 01:07:16,616 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7550, loss[loss=0.0913, beats_loss=0.01342, ecapa_loss=0.0001851, whisper_loss=0.07602, over 20134.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01119, ecapa_loss=0.0001833, whisper_loss=0.09351, over 3911923.63 frames. ], batch size: 81, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:07:28,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1379820.0, ans=10.0 2024-08-12 01:07:34,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1379920.0, ans=0.0 2024-08-12 01:07:48,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1380020.0, ans=0.04949747468305833 2024-08-12 01:07:55,771 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 27 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 01:07:59,379 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.522e+01 2.796e+01 3.153e+01 8.804e+01, threshold=5.592e+01, percent-clipped=1.0 2024-08-12 01:08:11,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1380120.0, ans=0.035 2024-08-12 01:08:12,130 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2024-08-12 01:08:28,718 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7600, loss[loss=0.1126, beats_loss=0.01011, ecapa_loss=0.000151, whisper_loss=0.101, over 19538.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01121, ecapa_loss=0.000185, whisper_loss=0.09328, over 3918733.12 frames. ], batch size: 74, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:08:30,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1380320.0, ans=0.2 2024-08-12 01:08:33,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1380320.0, ans=0.125 2024-08-12 01:08:42,905 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.387e+02 2024-08-12 01:09:10,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2024-08-12 01:09:17,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1380620.0, ans=0.1 2024-08-12 01:09:37,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1380720.0, ans=0.125 2024-08-12 01:09:41,899 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 01:09:44,250 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7650, loss[loss=0.1132, beats_loss=0.01024, ecapa_loss=0.000181, whisper_loss=0.1011, over 22492.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01117, ecapa_loss=0.0001846, whisper_loss=0.09275, over 3917222.28 frames. ], batch size: 90, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:09:49,918 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=15.0 2024-08-12 01:09:53,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1380820.0, ans=0.2 2024-08-12 01:10:10,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1380920.0, ans=0.2 2024-08-12 01:10:12,767 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-12 01:10:21,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1381020.0, ans=0.125 2024-08-12 01:10:26,666 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 01:10:31,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.632e+01 2.933e+01 3.294e+01 6.262e+01, threshold=5.865e+01, percent-clipped=1.0 2024-08-12 01:10:31,608 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:10:40,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1381120.0, ans=0.2 2024-08-12 01:10:45,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1381220.0, ans=0.0 2024-08-12 01:11:02,630 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7700, loss[loss=0.07677, beats_loss=0.01437, ecapa_loss=0.0001448, whisper_loss=0.06096, over 16058.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01114, ecapa_loss=0.0001859, whisper_loss=0.09241, over 3863373.68 frames. ], batch size: 64, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:11:04,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1381320.0, ans=0.0 2024-08-12 01:11:17,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1381420.0, ans=0.125 2024-08-12 01:11:19,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1381420.0, ans=0.125 2024-08-12 01:11:27,972 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-12 01:11:42,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1381520.0, ans=0.125 2024-08-12 01:12:14,661 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-12 01:12:16,427 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7750, loss[loss=0.08563, beats_loss=0.01184, ecapa_loss=0.0002346, whisper_loss=0.07145, over 17079.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01121, ecapa_loss=0.0001845, whisper_loss=0.09225, over 3902887.37 frames. ], batch size: 70, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:12:19,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1381820.0, ans=0.0 2024-08-12 01:12:28,141 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 01:12:29,763 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 24 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-12 01:12:45,040 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 01:12:49,965 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2024-08-12 01:12:54,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1382020.0, ans=0.0 2024-08-12 01:13:00,936 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.543e+01 2.861e+01 3.273e+01 8.260e+01, threshold=5.723e+01, percent-clipped=1.0 2024-08-12 01:13:02,559 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 01:13:06,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=1382120.0, ans=15.0 2024-08-12 01:13:12,424 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:13:23,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1382220.0, ans=0.1 2024-08-12 01:13:29,933 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 01:13:31,309 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7800, loss[loss=0.0937, beats_loss=0.01131, ecapa_loss=0.0001579, whisper_loss=0.08081, over 14504.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01114, ecapa_loss=0.0001854, whisper_loss=0.09246, over 3865323.57 frames. ], batch size: 56, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:13:34,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1382320.0, ans=0.5 2024-08-12 01:13:35,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1382320.0, ans=0.125 2024-08-12 01:13:47,435 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 33 from Vox, 27 fro AS 2024-08-12 01:13:55,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1382420.0, ans=0.125 2024-08-12 01:14:01,773 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 01:14:08,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1382520.0, ans=0.2 2024-08-12 01:14:22,435 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 01:14:40,841 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=15.0 2024-08-12 01:14:41,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1382720.0, ans=0.1 2024-08-12 01:14:44,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1382820.0, ans=0.0 2024-08-12 01:14:45,230 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7850, loss[loss=0.09778, beats_loss=0.01397, ecapa_loss=0.0001607, whisper_loss=0.0822, over 21297.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01111, ecapa_loss=0.0001867, whisper_loss=0.09302, over 3884567.38 frames. ], batch size: 85, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:14:50,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1382820.0, ans=0.1 2024-08-12 01:14:54,719 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-12 01:14:56,046 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 01:15:11,626 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 01:15:21,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1383020.0, ans=0.0 2024-08-12 01:15:29,419 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.565e+01 2.814e+01 3.165e+01 4.880e+01, threshold=5.628e+01, percent-clipped=0.0 2024-08-12 01:15:39,811 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 01:15:41,642 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2024-08-12 01:15:42,667 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 01:15:46,183 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2024-08-12 01:15:57,871 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.59 vs. limit=6.0 2024-08-12 01:15:58,347 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7900, loss[loss=0.1251, beats_loss=0.007761, ecapa_loss=0.0001965, whisper_loss=0.1154, over 22358.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01121, ecapa_loss=0.0001851, whisper_loss=0.09261, over 3864967.53 frames. ], batch size: 85, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:16:00,130 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 24 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-12 01:16:05,111 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 01:16:06,372 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 01:16:28,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1383520.0, ans=0.2 2024-08-12 01:16:54,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1383620.0, ans=0.2 2024-08-12 01:16:55,422 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 01:16:56,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.95 vs. limit=6.0 2024-08-12 01:16:58,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1383720.0, ans=0.0 2024-08-12 01:17:12,862 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 7950, loss[loss=0.1279, beats_loss=0.00883, ecapa_loss=0.0001734, whisper_loss=0.1173, over 17759.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01114, ecapa_loss=0.0001857, whisper_loss=0.09331, over 3898839.99 frames. ], batch size: 67, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:17:24,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1383820.0, ans=0.125 2024-08-12 01:17:24,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1383820.0, ans=0.1 2024-08-12 01:17:29,667 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-12 01:17:38,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1383920.0, ans=0.0 2024-08-12 01:17:41,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1384020.0, ans=0.1 2024-08-12 01:17:45,800 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-12 01:17:48,715 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 01:17:52,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1384020.0, ans=0.0 2024-08-12 01:17:54,550 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 01:17:57,306 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.551e+01 2.931e+01 3.391e+01 6.201e+01, threshold=5.862e+01, percent-clipped=1.0 2024-08-12 01:18:26,633 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8000, loss[loss=0.1037, beats_loss=0.01123, ecapa_loss=0.0002039, whisper_loss=0.09045, over 21930.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01115, ecapa_loss=0.000186, whisper_loss=0.09287, over 3859444.86 frames. ], batch size: 89, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:18:43,856 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2024-08-12 01:18:46,608 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-12 01:18:49,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1384420.0, ans=15.0 2024-08-12 01:19:06,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1384520.0, ans=0.125 2024-08-12 01:19:24,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1384720.0, ans=0.2 2024-08-12 01:19:39,243 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8050, loss[loss=0.07544, beats_loss=0.01405, ecapa_loss=0.0001897, whisper_loss=0.0595, over 20706.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01113, ecapa_loss=0.0001849, whisper_loss=0.09306, over 3879985.89 frames. ], batch size: 90, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:19:44,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1384820.0, ans=0.125 2024-08-12 01:19:47,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1384820.0, ans=0.1 2024-08-12 01:19:51,224 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 01:20:06,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1385020.0, ans=0.2 2024-08-12 01:20:12,218 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 27 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 01:20:22,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1385120.0, ans=0.125 2024-08-12 01:20:22,614 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-12 01:20:22,980 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.542e+01 2.903e+01 3.299e+01 4.788e+01, threshold=5.807e+01, percent-clipped=0.0 2024-08-12 01:20:27,554 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:20:35,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1385120.0, ans=0.2 2024-08-12 01:20:41,941 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 01:20:51,552 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8100, loss[loss=0.09832, beats_loss=0.01242, ecapa_loss=0.0001471, whisper_loss=0.08442, over 13973.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01111, ecapa_loss=0.0001852, whisper_loss=0.09271, over 3850632.09 frames. ], batch size: 55, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:20:56,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1385320.0, ans=0.0 2024-08-12 01:21:07,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1385420.0, ans=0.125 2024-08-12 01:21:12,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1385420.0, ans=0.0 2024-08-12 01:21:14,277 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.13 vs. limit=10.0 2024-08-12 01:21:17,925 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 36 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 01:21:20,837 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 22 from Vox, 14 fro AS 2024-08-12 01:21:33,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.43 vs. limit=10.0 2024-08-12 01:21:36,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1385620.0, ans=0.1 2024-08-12 01:21:41,082 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 01:21:49,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1385720.0, ans=0.2 2024-08-12 01:21:50,766 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2024-08-12 01:21:55,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1385720.0, ans=0.0 2024-08-12 01:22:04,129 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8150, loss[loss=0.1198, beats_loss=0.01198, ecapa_loss=0.000159, whisper_loss=0.1062, over 18789.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01101, ecapa_loss=0.0001862, whisper_loss=0.09372, over 3871380.07 frames. ], batch size: 73, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:22:06,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1385820.0, ans=0.125 2024-08-12 01:22:13,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1385820.0, ans=0.05 2024-08-12 01:22:16,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1385820.0, ans=0.2 2024-08-12 01:22:23,155 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 01:22:24,631 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 01:22:33,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1386020.0, ans=0.2 2024-08-12 01:22:35,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1386020.0, ans=0.1 2024-08-12 01:22:35,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1386020.0, ans=0.0 2024-08-12 01:22:47,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.599e+01 2.928e+01 3.345e+01 4.607e+01, threshold=5.855e+01, percent-clipped=0.0 2024-08-12 01:22:51,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1386120.0, ans=0.125 2024-08-12 01:22:55,353 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2024-08-12 01:23:03,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2024-08-12 01:23:04,613 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 01:23:05,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1386220.0, ans=0.125 2024-08-12 01:23:08,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1386220.0, ans=0.125 2024-08-12 01:23:17,453 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8200, loss[loss=0.1117, beats_loss=0.009228, ecapa_loss=0.0001512, whisper_loss=0.101, over 18000.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01105, ecapa_loss=0.0001868, whisper_loss=0.09353, over 3856451.98 frames. ], batch size: 66, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:23:28,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1386320.0, ans=0.125 2024-08-12 01:23:30,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1386320.0, ans=0.1 2024-08-12 01:23:39,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1386420.0, ans=0.0 2024-08-12 01:24:24,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1386720.0, ans=0.2 2024-08-12 01:24:31,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1386820.0, ans=0.125 2024-08-12 01:24:31,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1386820.0, ans=0.125 2024-08-12 01:24:32,311 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8250, loss[loss=0.1102, beats_loss=0.01211, ecapa_loss=0.0001435, whisper_loss=0.09666, over 16682.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01109, ecapa_loss=0.0001864, whisper_loss=0.09358, over 3877653.40 frames. ], batch size: 61, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:24:41,596 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-12 01:24:50,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1386920.0, ans=0.125 2024-08-12 01:25:09,099 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 01:25:16,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.606e+01 2.891e+01 3.345e+01 5.457e+01, threshold=5.782e+01, percent-clipped=0.0 2024-08-12 01:25:16,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1387120.0, ans=0.125 2024-08-12 01:25:25,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1387120.0, ans=0.0 2024-08-12 01:25:33,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1387220.0, ans=0.125 2024-08-12 01:25:42,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-12 01:25:45,248 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 12 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 01:25:46,389 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8300, loss[loss=0.07188, beats_loss=0.01167, ecapa_loss=0.0002061, whisper_loss=0.05815, over 13382.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01109, ecapa_loss=0.0001853, whisper_loss=0.09322, over 3901446.05 frames. ], batch size: 58, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:26:13,610 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-12 01:26:23,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1387520.0, ans=0.0 2024-08-12 01:26:30,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1387620.0, ans=0.1 2024-08-12 01:26:58,705 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 01:27:02,071 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8350, loss[loss=0.09977, beats_loss=0.01268, ecapa_loss=0.0001899, whisper_loss=0.08519, over 18318.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01124, ecapa_loss=0.000186, whisper_loss=0.09187, over 3889170.78 frames. ], batch size: 79, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:27:07,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1387820.0, ans=0.0 2024-08-12 01:27:09,746 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 01:27:22,122 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 01:27:22,841 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2024-08-12 01:27:28,887 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-08-12 01:27:47,179 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.752e+01 3.106e+01 3.684e+01 1.573e+02, threshold=6.213e+01, percent-clipped=3.0 2024-08-12 01:27:55,272 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2024-08-12 01:28:03,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=1388220.0, ans=0.2 2024-08-12 01:28:05,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1388220.0, ans=0.125 2024-08-12 01:28:16,815 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8400, loss[loss=0.1048, beats_loss=0.01182, ecapa_loss=0.0001683, whisper_loss=0.09132, over 23258.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01123, ecapa_loss=0.0001853, whisper_loss=0.09174, over 3882158.47 frames. ], batch size: 94, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:28:22,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1388320.0, ans=0.0 2024-08-12 01:28:23,292 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2024-08-12 01:28:25,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1388320.0, ans=0.09899494936611666 2024-08-12 01:28:42,172 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 28 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 01:28:51,237 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 01:29:10,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1388620.0, ans=0.2 2024-08-12 01:29:20,797 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2024-08-12 01:29:29,436 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8450, loss[loss=0.1103, beats_loss=0.01111, ecapa_loss=0.000186, whisper_loss=0.09728, over 23353.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01112, ecapa_loss=0.0001856, whisper_loss=0.09264, over 3894483.49 frames. ], batch size: 95, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:29:33,862 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-12 01:29:34,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1388820.0, ans=0.2 2024-08-12 01:29:47,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1388920.0, ans=0.125 2024-08-12 01:29:48,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1388920.0, ans=0.0 2024-08-12 01:29:50,740 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-12 01:30:01,261 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 01:30:06,934 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 01:30:12,254 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 2.661e+01 3.023e+01 3.413e+01 6.376e+01, threshold=6.047e+01, percent-clipped=1.0 2024-08-12 01:30:22,691 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 01:30:22,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1389120.0, ans=0.5 2024-08-12 01:30:33,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1389220.0, ans=0.125 2024-08-12 01:30:39,700 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2024-08-12 01:30:40,149 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8500, loss[loss=0.1042, beats_loss=0.01183, ecapa_loss=0.0001492, whisper_loss=0.09084, over 15983.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01106, ecapa_loss=0.0001849, whisper_loss=0.09295, over 3898599.77 frames. ], batch size: 60, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:30:50,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1389320.0, ans=0.125 2024-08-12 01:30:51,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.21 vs. limit=10.0 2024-08-12 01:30:53,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1389420.0, ans=0.0 2024-08-12 01:30:57,869 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 01:31:16,033 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 01:31:19,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1389520.0, ans=0.1 2024-08-12 01:31:27,418 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-12 01:31:28,067 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=12.0 2024-08-12 01:31:33,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1389620.0, ans=0.125 2024-08-12 01:31:51,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1389820.0, ans=0.1 2024-08-12 01:31:52,518 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8550, loss[loss=0.1193, beats_loss=0.009346, ecapa_loss=0.0001659, whisper_loss=0.1082, over 16237.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01106, ecapa_loss=0.0001856, whisper_loss=0.09281, over 3883813.32 frames. ], batch size: 62, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:31:54,455 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 01:31:54,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1389820.0, ans=0.0 2024-08-12 01:31:56,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-12 01:32:07,107 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-12 01:32:08,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1389920.0, ans=0.1 2024-08-12 01:32:17,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1389920.0, ans=0.0 2024-08-12 01:32:17,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1389920.0, ans=0.125 2024-08-12 01:32:18,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1389920.0, ans=0.2 2024-08-12 01:32:19,213 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-08-12 01:32:21,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1390020.0, ans=0.125 2024-08-12 01:32:37,393 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.566e+01 2.875e+01 3.249e+01 7.628e+01, threshold=5.750e+01, percent-clipped=1.0 2024-08-12 01:32:44,229 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 01:32:46,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1390120.0, ans=0.125 2024-08-12 01:32:57,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1390220.0, ans=0.0 2024-08-12 01:33:01,583 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 01:33:03,801 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8600, loss[loss=0.1061, beats_loss=0.01242, ecapa_loss=0.0001952, whisper_loss=0.09172, over 22507.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01107, ecapa_loss=0.0001853, whisper_loss=0.09279, over 3893302.15 frames. ], batch size: 92, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:33:04,005 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 01:33:14,841 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=15.0 2024-08-12 01:33:22,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1390420.0, ans=0.2 2024-08-12 01:33:25,777 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 01:33:32,328 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2024-08-12 01:33:49,006 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 01:34:09,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1390720.0, ans=0.125 2024-08-12 01:34:13,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1390820.0, ans=0.0 2024-08-12 01:34:14,021 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8650, loss[loss=0.09718, beats_loss=0.01331, ecapa_loss=0.0001874, whisper_loss=0.08199, over 22149.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01108, ecapa_loss=0.0001856, whisper_loss=0.0928, over 3903927.15 frames. ], batch size: 92, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:34:18,645 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 01:34:20,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1390820.0, ans=0.1 2024-08-12 01:34:25,642 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 01:34:41,893 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 01:34:43,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1391020.0, ans=0.125 2024-08-12 01:34:57,445 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.624e+01 3.118e+01 3.764e+01 6.887e+01, threshold=6.237e+01, percent-clipped=2.0 2024-08-12 01:34:59,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1391120.0, ans=0.125 2024-08-12 01:35:15,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1391220.0, ans=0.0 2024-08-12 01:35:25,317 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8700, loss[loss=0.113, beats_loss=0.01016, ecapa_loss=0.0001973, whisper_loss=0.1009, over 21856.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01108, ecapa_loss=0.0001853, whisper_loss=0.09263, over 3886689.41 frames. ], batch size: 89, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:35:36,905 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 01:35:44,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1391420.0, ans=0.125 2024-08-12 01:35:46,795 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 01:36:05,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1391520.0, ans=0.0 2024-08-12 01:36:07,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1391520.0, ans=0.2 2024-08-12 01:36:12,441 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 01:36:18,158 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 01:36:27,217 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 01:36:30,524 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 01:36:39,461 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8750, loss[loss=0.07306, beats_loss=0.01308, ecapa_loss=0.0001569, whisper_loss=0.05841, over 14634.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01108, ecapa_loss=0.0001845, whisper_loss=0.09246, over 3868051.82 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:36:43,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1391820.0, ans=0.1 2024-08-12 01:37:03,654 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 30 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 01:37:03,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1391920.0, ans=0.0 2024-08-12 01:37:11,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1392020.0, ans=0.125 2024-08-12 01:37:16,496 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 01:37:25,929 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.651e+01 2.928e+01 3.365e+01 6.201e+01, threshold=5.855e+01, percent-clipped=0.0 2024-08-12 01:37:54,000 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8800, loss[loss=0.093, beats_loss=0.01301, ecapa_loss=0.000185, whisper_loss=0.07815, over 19921.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01114, ecapa_loss=0.0001845, whisper_loss=0.09217, over 3867558.36 frames. ], batch size: 81, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:38:09,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1392420.0, ans=0.125 2024-08-12 01:38:24,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1392520.0, ans=0.0 2024-08-12 01:39:08,897 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8850, loss[loss=0.1031, beats_loss=0.01175, ecapa_loss=0.0001424, whisper_loss=0.08991, over 14496.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01117, ecapa_loss=0.0001847, whisper_loss=0.0922, over 3890738.40 frames. ], batch size: 54, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:39:28,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1392920.0, ans=0.125 2024-08-12 01:39:29,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1392920.0, ans=0.125 2024-08-12 01:39:34,361 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-08-12 01:39:37,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1393020.0, ans=0.2 2024-08-12 01:39:53,361 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.605e+01 2.898e+01 3.315e+01 6.590e+01, threshold=5.796e+01, percent-clipped=1.0 2024-08-12 01:39:56,158 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 01:39:58,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1393120.0, ans=0.0 2024-08-12 01:40:04,049 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.81 vs. limit=15.0 2024-08-12 01:40:06,271 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 01:40:11,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1393220.0, ans=0.0 2024-08-12 01:40:20,407 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8900, loss[loss=0.08922, beats_loss=0.0134, ecapa_loss=0.000154, whisper_loss=0.07428, over 20274.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01122, ecapa_loss=0.000185, whisper_loss=0.09207, over 3896628.74 frames. ], batch size: 83, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:40:20,681 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 01:40:22,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1393320.0, ans=0.1 2024-08-12 01:40:31,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1393320.0, ans=0.07 2024-08-12 01:40:34,742 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 01:40:55,639 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 01:40:59,929 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 01:41:02,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1393620.0, ans=0.0 2024-08-12 01:41:07,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1393620.0, ans=0.125 2024-08-12 01:41:11,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1393620.0, ans=0.0 2024-08-12 01:41:12,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1393620.0, ans=0.2 2024-08-12 01:41:31,008 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 8950, loss[loss=0.08725, beats_loss=0.01139, ecapa_loss=0.0002171, whisper_loss=0.07369, over 21071.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01122, ecapa_loss=0.0001861, whisper_loss=0.0919, over 3877324.97 frames. ], batch size: 91, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:41:35,823 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-08-12 01:41:50,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1393920.0, ans=0.0 2024-08-12 01:42:09,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1394020.0, ans=0.2 2024-08-12 01:42:11,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1394120.0, ans=0.125 2024-08-12 01:42:13,563 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.694e+01 3.111e+01 3.699e+01 1.037e+02, threshold=6.222e+01, percent-clipped=1.0 2024-08-12 01:42:19,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1394120.0, ans=0.0 2024-08-12 01:42:21,644 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-12 01:42:38,979 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9000, loss[loss=0.1239, beats_loss=0.009632, ecapa_loss=0.0002052, whisper_loss=0.1123, over 22875.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0112, ecapa_loss=0.0001868, whisper_loss=0.09203, over 3888803.00 frames. ], batch size: 88, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:42:38,980 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 01:43:16,657 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on ASR_libri: loss=0.2567, beats_loss=0, ecapa_loss=0.0006076, whisper_loss=0.2507, over 922467.00 frames. 2024-08-12 01:43:34,628 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on SV_voxceleb1: loss=0.005114, beats_loss=0, ecapa_loss=0.0005114, whisper_loss=0, over 939242.00 frames. 2024-08-12 01:44:28,736 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8349, 0.9077, 1.1203, 0.6486, 0.7989, 0.9899, 0.7842, 0.7035], device='cuda:1') 2024-08-12 01:45:19,132 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on AT_audioset: loss=0.02463, beats_loss=0.02463, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 01:45:19,136 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 01:45:33,920 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.50 vs. limit=22.5 2024-08-12 01:46:17,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1394720.0, ans=0.0 2024-08-12 01:46:28,847 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9050, loss[loss=0.1276, beats_loss=0.009744, ecapa_loss=0.0002292, whisper_loss=0.1156, over 22752.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01112, ecapa_loss=0.0001869, whisper_loss=0.09242, over 3871949.36 frames. ], batch size: 94, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:46:32,594 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-12 01:46:49,445 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.56 vs. limit=15.0 2024-08-12 01:46:49,640 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.12 vs. limit=10.0 2024-08-12 01:46:51,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1394920.0, ans=0.125 2024-08-12 01:46:53,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1394920.0, ans=0.0 2024-08-12 01:47:11,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.578e+01 2.935e+01 3.281e+01 5.128e+01, threshold=5.870e+01, percent-clipped=0.0 2024-08-12 01:47:37,890 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9100, loss[loss=0.1228, beats_loss=0.00888, ecapa_loss=0.0001736, whisper_loss=0.1122, over 17216.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01116, ecapa_loss=0.000187, whisper_loss=0.09195, over 3843016.69 frames. ], batch size: 63, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:47:44,796 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-12 01:47:45,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1395320.0, ans=0.0 2024-08-12 01:47:47,396 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-12 01:48:09,413 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 01:48:20,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1395620.0, ans=0.125 2024-08-12 01:48:29,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1395620.0, ans=0.0 2024-08-12 01:48:29,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1395620.0, ans=0.125 2024-08-12 01:48:34,008 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 01:48:40,182 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.11 vs. limit=15.0 2024-08-12 01:48:41,476 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.75 vs. limit=6.0 2024-08-12 01:48:45,534 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9150, loss[loss=0.08824, beats_loss=0.01386, ecapa_loss=0.0002258, whisper_loss=0.07212, over 20175.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01118, ecapa_loss=0.0001859, whisper_loss=0.09191, over 3861698.25 frames. ], batch size: 89, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:48:58,171 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 01:48:59,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1395920.0, ans=0.0 2024-08-12 01:49:00,809 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 40 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 01:49:01,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1395920.0, ans=0.0 2024-08-12 01:49:04,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1395920.0, ans=0.1 2024-08-12 01:49:10,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.50 vs. limit=15.0 2024-08-12 01:49:15,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.01 vs. limit=10.0 2024-08-12 01:49:28,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.582e+01 2.877e+01 3.376e+01 5.392e+01, threshold=5.754e+01, percent-clipped=0.0 2024-08-12 01:49:53,987 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9200, loss[loss=0.1068, beats_loss=0.01057, ecapa_loss=0.0001746, whisper_loss=0.09449, over 18169.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01116, ecapa_loss=0.000184, whisper_loss=0.09194, over 3872103.21 frames. ], batch size: 72, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:49:56,927 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 01:50:08,158 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 01:50:13,382 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 01:50:13,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1396420.0, ans=0.125 2024-08-12 01:50:28,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1396520.0, ans=0.0 2024-08-12 01:50:38,544 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 01:50:45,875 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2024-08-12 01:51:02,532 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9250, loss[loss=0.1014, beats_loss=0.01277, ecapa_loss=0.0002007, whisper_loss=0.0866, over 15401.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01109, ecapa_loss=0.0001851, whisper_loss=0.09239, over 3883395.29 frames. ], batch size: 63, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:51:16,125 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-12 01:51:20,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1396920.0, ans=0.125 2024-08-12 01:51:20,818 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.84 vs. limit=15.0 2024-08-12 01:51:25,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1396920.0, ans=0.125 2024-08-12 01:51:29,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1397020.0, ans=0.125 2024-08-12 01:51:33,305 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 01:51:37,566 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 01:51:44,111 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.700e+01 2.936e+01 3.310e+01 8.820e+01, threshold=5.872e+01, percent-clipped=1.0 2024-08-12 01:51:44,337 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 01:52:04,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1397220.0, ans=0.1 2024-08-12 01:52:10,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9300, loss[loss=0.1214, beats_loss=0.01263, ecapa_loss=0.0001534, whisper_loss=0.1072, over 23612.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01112, ecapa_loss=0.0001857, whisper_loss=0.09238, over 3886058.36 frames. ], batch size: 92, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:52:20,308 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 14 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-12 01:52:20,690 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:52:39,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1397520.0, ans=0.125 2024-08-12 01:52:41,175 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-08-12 01:52:46,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1397520.0, ans=0.125 2024-08-12 01:52:54,888 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:53:00,537 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.698e-01 2024-08-12 01:53:15,375 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 01:53:19,484 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9350, loss[loss=0.1189, beats_loss=0.01116, ecapa_loss=0.0001534, whisper_loss=0.1062, over 18101.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01113, ecapa_loss=0.0001849, whisper_loss=0.09183, over 3883233.29 frames. ], batch size: 66, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:53:23,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1397820.0, ans=0.125 2024-08-12 01:53:24,887 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-12 01:53:29,325 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 01:53:45,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1397920.0, ans=0.2 2024-08-12 01:53:53,289 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.150e-02 2024-08-12 01:53:57,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1398020.0, ans=0.025 2024-08-12 01:54:02,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.050e+01 2.487e+01 2.851e+01 3.233e+01 4.318e+01, threshold=5.702e+01, percent-clipped=0.0 2024-08-12 01:54:19,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1398220.0, ans=10.0 2024-08-12 01:54:19,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1398220.0, ans=10.0 2024-08-12 01:54:29,338 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9400, loss[loss=0.08714, beats_loss=0.01304, ecapa_loss=0.0001706, whisper_loss=0.0724, over 15319.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01111, ecapa_loss=0.0001843, whisper_loss=0.09262, over 3876395.73 frames. ], batch size: 63, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:54:43,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1398420.0, ans=0.2 2024-08-12 01:54:58,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1398520.0, ans=0.2 2024-08-12 01:55:18,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1398620.0, ans=0.0 2024-08-12 01:55:26,546 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2024-08-12 01:55:38,123 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9450, loss[loss=0.1126, beats_loss=0.009962, ecapa_loss=0.0002189, whisper_loss=0.1005, over 21867.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01117, ecapa_loss=0.0001853, whisper_loss=0.09197, over 3912996.88 frames. ], batch size: 91, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:55:38,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2024-08-12 01:55:41,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1398820.0, ans=0.0 2024-08-12 01:56:11,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1399020.0, ans=0.125 2024-08-12 01:56:20,509 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.626e+01 2.954e+01 3.375e+01 5.231e+01, threshold=5.908e+01, percent-clipped=0.0 2024-08-12 01:56:25,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1399120.0, ans=0.125 2024-08-12 01:56:29,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1399120.0, ans=0.05 2024-08-12 01:56:33,812 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.14 vs. limit=22.5 2024-08-12 01:56:42,307 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-12 01:56:46,573 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9500, loss[loss=0.09157, beats_loss=0.0136, ecapa_loss=0.000152, whisper_loss=0.07645, over 22459.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01116, ecapa_loss=0.0001848, whisper_loss=0.09229, over 3891824.93 frames. ], batch size: 91, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:56:56,661 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 10 from Vox, 46 fro AS 2024-08-12 01:57:05,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1399420.0, ans=0.5 2024-08-12 01:57:18,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1399520.0, ans=0.1 2024-08-12 01:57:24,103 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2024-08-12 01:57:26,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1399520.0, ans=0.07 2024-08-12 01:57:35,986 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-12 01:57:37,196 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 16 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 01:57:56,084 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9550, loss[loss=0.09007, beats_loss=0.0116, ecapa_loss=0.0002236, whisper_loss=0.07623, over 20109.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01117, ecapa_loss=0.0001863, whisper_loss=0.09212, over 3901482.70 frames. ], batch size: 87, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:57:58,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1399820.0, ans=0.125 2024-08-12 01:58:34,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1400020.0, ans=0.2 2024-08-12 01:58:36,698 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 01:58:40,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.623e+01 2.882e+01 3.186e+01 4.825e+01, threshold=5.764e+01, percent-clipped=0.0 2024-08-12 01:58:54,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1400220.0, ans=0.0 2024-08-12 01:59:06,738 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9600, loss[loss=0.09417, beats_loss=0.009471, ecapa_loss=0.000239, whisper_loss=0.08231, over 21291.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01117, ecapa_loss=0.0001856, whisper_loss=0.09196, over 3912931.04 frames. ], batch size: 92, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:59:23,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2024-08-12 01:59:31,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1400420.0, ans=0.015 2024-08-12 02:00:01,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2024-08-12 02:00:05,293 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2024-08-12 02:00:08,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=15.0 2024-08-12 02:00:13,231 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-12 02:00:16,870 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9650, loss[loss=0.106, beats_loss=0.01127, ecapa_loss=0.0001693, whisper_loss=0.09301, over 19810.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01126, ecapa_loss=0.0001848, whisper_loss=0.09058, over 3856865.44 frames. ], batch size: 80, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:00:21,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1400820.0, ans=0.0 2024-08-12 02:00:25,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1400820.0, ans=0.125 2024-08-12 02:00:26,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1400820.0, ans=0.2 2024-08-12 02:00:40,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1400920.0, ans=0.125 2024-08-12 02:00:45,656 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-12 02:00:54,759 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 02:01:00,096 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.704e+01 3.034e+01 3.483e+01 7.919e+01, threshold=6.068e+01, percent-clipped=1.0 2024-08-12 02:01:26,552 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9700, loss[loss=0.1166, beats_loss=0.01238, ecapa_loss=0.0001624, whisper_loss=0.1026, over 19275.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01112, ecapa_loss=0.0001854, whisper_loss=0.09141, over 3831573.92 frames. ], batch size: 74, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:01:30,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1401320.0, ans=0.0 2024-08-12 02:01:42,899 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 02:01:50,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=1401420.0, ans=0.02 2024-08-12 02:02:00,026 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 02:02:30,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1401720.0, ans=0.1 2024-08-12 02:02:36,887 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9750, loss[loss=0.1109, beats_loss=0.009913, ecapa_loss=0.000216, whisper_loss=0.0988, over 22662.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0111, ecapa_loss=0.0001851, whisper_loss=0.09136, over 3854778.40 frames. ], batch size: 91, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:02:37,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1401820.0, ans=0.125 2024-08-12 02:02:38,073 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.31 vs. limit=10.0 2024-08-12 02:02:38,737 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-12 02:02:42,921 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-12 02:02:59,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1401920.0, ans=0.125 2024-08-12 02:03:03,996 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2024-08-12 02:03:08,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1402020.0, ans=0.025 2024-08-12 02:03:20,685 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.664e+01 3.101e+01 3.565e+01 5.192e+01, threshold=6.201e+01, percent-clipped=0.0 2024-08-12 02:03:47,776 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9800, loss[loss=0.1141, beats_loss=0.01056, ecapa_loss=0.0001594, whisper_loss=0.102, over 22665.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01114, ecapa_loss=0.0001847, whisper_loss=0.09125, over 3840079.07 frames. ], batch size: 90, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:03:48,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1402320.0, ans=0.2 2024-08-12 02:04:13,766 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 14 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-12 02:04:16,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1402520.0, ans=0.2 2024-08-12 02:04:18,477 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=22.5 2024-08-12 02:04:23,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1402520.0, ans=0.0 2024-08-12 02:04:34,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1402620.0, ans=0.125 2024-08-12 02:04:35,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1402620.0, ans=0.015 2024-08-12 02:04:35,905 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 02:04:42,800 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.947e+05 2024-08-12 02:04:50,820 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 02:04:57,804 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 02:04:58,864 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9850, loss[loss=0.1065, beats_loss=0.01155, ecapa_loss=0.0001926, whisper_loss=0.09305, over 18794.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01111, ecapa_loss=0.0001849, whisper_loss=0.09244, over 3882437.19 frames. ], batch size: 74, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:04:59,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1402820.0, ans=0.0 2024-08-12 02:05:11,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1402920.0, ans=0.0 2024-08-12 02:05:31,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1403020.0, ans=0.125 2024-08-12 02:05:42,034 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.518e+01 2.832e+01 3.271e+01 6.017e+01, threshold=5.663e+01, percent-clipped=0.0 2024-08-12 02:05:48,016 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 02:05:58,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1403220.0, ans=0.1 2024-08-12 02:06:01,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1403220.0, ans=0.95 2024-08-12 02:06:02,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1403220.0, ans=0.125 2024-08-12 02:06:05,161 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 02:06:09,019 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9900, loss[loss=0.08755, beats_loss=0.01231, ecapa_loss=0.000198, whisper_loss=0.07326, over 20855.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01118, ecapa_loss=0.0001844, whisper_loss=0.09196, over 3879583.57 frames. ], batch size: 90, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:06:25,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1403420.0, ans=0.0 2024-08-12 02:06:29,712 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 32 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 02:06:33,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1403420.0, ans=0.125 2024-08-12 02:06:42,325 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 02:06:47,967 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 02:07:09,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1403720.0, ans=0.125 2024-08-12 02:07:19,998 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 9950, loss[loss=0.08243, beats_loss=0.01382, ecapa_loss=0.0001504, whisper_loss=0.06711, over 14734.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01117, ecapa_loss=0.0001852, whisper_loss=0.09221, over 3860408.03 frames. ], batch size: 58, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:07:20,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1403820.0, ans=0.125 2024-08-12 02:07:27,855 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 02:07:47,933 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 15 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 02:07:57,252 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.901e-01 2024-08-12 02:08:03,747 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.549e+01 2.857e+01 3.293e+01 8.751e+01, threshold=5.714e+01, percent-clipped=2.0 2024-08-12 02:08:09,256 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 02:08:26,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1404220.0, ans=0.125 2024-08-12 02:08:29,935 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10000, loss[loss=0.09538, beats_loss=0.01226, ecapa_loss=0.0001921, whisper_loss=0.08119, over 21781.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0112, ecapa_loss=0.0001851, whisper_loss=0.09206, over 3841737.90 frames. ], batch size: 91, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:08:50,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1404420.0, ans=0.125 2024-08-12 02:08:52,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1404420.0, ans=0.125 2024-08-12 02:08:59,692 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 02:09:11,457 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-12 02:09:22,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1404620.0, ans=0.0 2024-08-12 02:09:26,862 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 02:09:35,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1404720.0, ans=0.2 2024-08-12 02:09:40,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1404720.0, ans=0.05 2024-08-12 02:09:44,468 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10050, loss[loss=0.1113, beats_loss=0.01077, ecapa_loss=0.0001605, whisper_loss=0.09889, over 15678.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01117, ecapa_loss=0.0001844, whisper_loss=0.09194, over 3849876.12 frames. ], batch size: 61, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:09:55,565 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 02:09:58,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1404920.0, ans=0.125 2024-08-12 02:10:00,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1404920.0, ans=0.2 2024-08-12 02:10:20,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1405020.0, ans=0.0 2024-08-12 02:10:27,341 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-08-12 02:10:28,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1405120.0, ans=0.2 2024-08-12 02:10:30,647 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.648e+01 2.983e+01 3.418e+01 4.523e+01, threshold=5.967e+01, percent-clipped=0.0 2024-08-12 02:10:49,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1405220.0, ans=0.125 2024-08-12 02:10:56,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1405220.0, ans=0.2 2024-08-12 02:11:00,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1405220.0, ans=0.125 2024-08-12 02:11:02,977 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10100, loss[loss=0.08049, beats_loss=0.0119, ecapa_loss=0.0002422, whisper_loss=0.06617, over 15043.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01115, ecapa_loss=0.0001843, whisper_loss=0.09293, over 3910538.08 frames. ], batch size: 71, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:11:18,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1405420.0, ans=0.0 2024-08-12 02:11:25,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1405420.0, ans=0.0 2024-08-12 02:11:27,100 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 02:11:45,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1405520.0, ans=0.125 2024-08-12 02:11:54,861 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-12 02:11:59,493 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=15.0 2024-08-12 02:12:08,605 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 02:12:16,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1405720.0, ans=0.0 2024-08-12 02:12:17,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1405720.0, ans=0.125 2024-08-12 02:12:25,233 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.15 vs. limit=22.5 2024-08-12 02:12:27,048 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10150, loss[loss=0.11, beats_loss=0.01205, ecapa_loss=0.0001777, whisper_loss=0.09619, over 20281.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01115, ecapa_loss=0.0001856, whisper_loss=0.09292, over 3898865.55 frames. ], batch size: 79, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:12:29,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1405820.0, ans=0.125 2024-08-12 02:12:44,850 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-12 02:12:56,666 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 02:13:23,340 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.579e+01 2.918e+01 3.241e+01 4.906e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-12 02:13:25,810 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 02:13:34,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1406120.0, ans=0.0 2024-08-12 02:13:46,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1406220.0, ans=0.125 2024-08-12 02:14:07,866 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10200, loss[loss=0.1091, beats_loss=0.008543, ecapa_loss=0.0002108, whisper_loss=0.09847, over 13803.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01108, ecapa_loss=0.0001864, whisper_loss=0.09242, over 3857328.15 frames. ], batch size: 54, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:14:52,549 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 02:14:55,547 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 02:14:57,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1406520.0, ans=0.125 2024-08-12 02:15:03,093 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 02:15:04,296 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=12.0 2024-08-12 02:16:01,461 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10250, loss[loss=0.1055, beats_loss=0.01033, ecapa_loss=0.0001559, whisper_loss=0.09362, over 18741.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01106, ecapa_loss=0.0001857, whisper_loss=0.09266, over 3895238.99 frames. ], batch size: 68, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:16:07,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1406820.0, ans=0.0 2024-08-12 02:16:19,879 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 02:16:45,171 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-12 02:17:04,107 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.107e+01 2.647e+01 2.891e+01 3.478e+01 5.936e+01, threshold=5.783e+01, percent-clipped=1.0 2024-08-12 02:17:17,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1407120.0, ans=0.0 2024-08-12 02:17:20,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1407220.0, ans=0.0 2024-08-12 02:17:25,378 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 02:17:43,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10300, loss[loss=0.1105, beats_loss=0.01229, ecapa_loss=0.0001023, whisper_loss=0.09723, over 17273.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01103, ecapa_loss=0.0001852, whisper_loss=0.09295, over 3886674.72 frames. ], batch size: 64, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:17:52,788 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.38 vs. limit=10.0 2024-08-12 02:17:59,349 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 02:18:08,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2024-08-12 02:18:10,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1407420.0, ans=0.0 2024-08-12 02:18:12,363 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2024-08-12 02:18:23,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1407520.0, ans=0.0 2024-08-12 02:18:58,500 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 35 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 02:19:00,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1407720.0, ans=0.125 2024-08-12 02:19:11,379 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10350, loss[loss=0.1181, beats_loss=0.0113, ecapa_loss=0.0001867, whisper_loss=0.1049, over 22732.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01113, ecapa_loss=0.0001857, whisper_loss=0.09288, over 3908782.78 frames. ], batch size: 87, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:19:56,145 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.600e+01 2.842e+01 3.107e+01 4.520e+01, threshold=5.684e+01, percent-clipped=0.0 2024-08-12 02:20:01,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.19 vs. limit=22.5 2024-08-12 02:20:16,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1408220.0, ans=0.125 2024-08-12 02:20:25,210 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10400, loss[loss=0.08666, beats_loss=0.01128, ecapa_loss=0.0002211, whisper_loss=0.07317, over 14127.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01119, ecapa_loss=0.0001844, whisper_loss=0.09295, over 3918151.44 frames. ], batch size: 57, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:20:34,460 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 22 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-12 02:20:49,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1408420.0, ans=0.0 2024-08-12 02:21:06,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1408520.0, ans=0.1 2024-08-12 02:21:25,978 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=12.0 2024-08-12 02:21:37,542 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10450, loss[loss=0.1012, beats_loss=0.01198, ecapa_loss=0.0001712, whisper_loss=0.08747, over 22368.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01127, ecapa_loss=0.0001835, whisper_loss=0.09189, over 3909761.71 frames. ], batch size: 91, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:21:39,093 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 02:21:43,401 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2024-08-12 02:22:03,576 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 25 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 02:22:21,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.627e+01 2.925e+01 3.348e+01 4.455e+01, threshold=5.851e+01, percent-clipped=0.0 2024-08-12 02:22:29,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1409120.0, ans=0.125 2024-08-12 02:22:49,594 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10500, loss[loss=0.1064, beats_loss=0.01148, ecapa_loss=0.000139, whisper_loss=0.09352, over 16049.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01125, ecapa_loss=0.0001837, whisper_loss=0.09195, over 3875135.82 frames. ], batch size: 62, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:22:49,812 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 02:23:01,783 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2024-08-12 02:23:11,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1409420.0, ans=10.0 2024-08-12 02:23:27,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1409520.0, ans=0.2 2024-08-12 02:24:02,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10550, loss[loss=0.1126, beats_loss=0.01083, ecapa_loss=0.0001971, whisper_loss=0.09979, over 14051.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01125, ecapa_loss=0.0001838, whisper_loss=0.09094, over 3845077.77 frames. ], batch size: 57, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:24:04,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1409820.0, ans=0.2 2024-08-12 02:24:09,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1409820.0, ans=0.0 2024-08-12 02:24:27,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1409920.0, ans=0.1 2024-08-12 02:24:39,040 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 02:24:46,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.599e+01 2.845e+01 3.296e+01 6.744e+01, threshold=5.691e+01, percent-clipped=1.0 2024-08-12 02:24:59,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1410220.0, ans=0.09899494936611666 2024-08-12 02:25:12,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1410320.0, ans=0.125 2024-08-12 02:25:13,088 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10600, loss[loss=0.103, beats_loss=0.01042, ecapa_loss=0.000175, whisper_loss=0.09087, over 18569.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01133, ecapa_loss=0.0001838, whisper_loss=0.09002, over 3861234.67 frames. ], batch size: 75, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:25:16,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1410320.0, ans=0.125 2024-08-12 02:25:26,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1410420.0, ans=0.0 2024-08-12 02:25:30,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1410420.0, ans=0.125 2024-08-12 02:25:30,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1410420.0, ans=0.2 2024-08-12 02:25:37,233 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 14 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 02:25:47,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1410520.0, ans=0.0 2024-08-12 02:25:53,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1410620.0, ans=0.2 2024-08-12 02:26:03,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1410620.0, ans=0.0 2024-08-12 02:26:13,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1410720.0, ans=0.0 2024-08-12 02:26:22,466 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10650, loss[loss=0.1192, beats_loss=0.01075, ecapa_loss=0.0001697, whisper_loss=0.1068, over 22041.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0112, ecapa_loss=0.0001836, whisper_loss=0.09113, over 3899416.92 frames. ], batch size: 85, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:26:26,570 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 02:26:33,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=1410820.0, ans=6.0 2024-08-12 02:26:36,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1410920.0, ans=0.025 2024-08-12 02:26:42,820 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 02:26:53,621 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 14 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 02:27:04,489 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.646e+01 2.959e+01 3.392e+01 4.637e+01, threshold=5.918e+01, percent-clipped=0.0 2024-08-12 02:27:06,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1411120.0, ans=0.125 2024-08-12 02:27:17,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1411220.0, ans=0.125 2024-08-12 02:27:26,796 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 02:27:30,810 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10700, loss[loss=0.1154, beats_loss=0.009859, ecapa_loss=0.0001875, whisper_loss=0.1037, over 23000.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01117, ecapa_loss=0.0001833, whisper_loss=0.09144, over 3890856.61 frames. ], batch size: 91, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:27:32,284 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 02:27:33,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1411320.0, ans=0.0 2024-08-12 02:27:38,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-12 02:27:38,560 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2024-08-12 02:27:45,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1411420.0, ans=0.1 2024-08-12 02:27:45,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1411420.0, ans=0.0 2024-08-12 02:27:52,295 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 02:27:53,712 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 02:27:59,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1411520.0, ans=0.125 2024-08-12 02:28:06,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1411520.0, ans=0.125 2024-08-12 02:28:06,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1411520.0, ans=0.0 2024-08-12 02:28:25,079 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 02:28:40,085 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10750, loss[loss=0.07174, beats_loss=0.01219, ecapa_loss=0.0001808, whisper_loss=0.05775, over 14831.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01121, ecapa_loss=0.0001838, whisper_loss=0.09173, over 3885041.56 frames. ], batch size: 63, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:28:47,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1411820.0, ans=0.0 2024-08-12 02:28:51,057 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 02:28:56,611 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 02:29:02,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1411920.0, ans=0.0 2024-08-12 02:29:11,286 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 02:29:22,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.596e+01 2.921e+01 3.440e+01 9.548e+01, threshold=5.843e+01, percent-clipped=1.0 2024-08-12 02:29:28,540 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 29 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 02:29:28,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1412120.0, ans=10.0 2024-08-12 02:29:40,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1412220.0, ans=0.0 2024-08-12 02:29:44,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1412220.0, ans=0.125 2024-08-12 02:29:48,778 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10800, loss[loss=0.1093, beats_loss=0.008948, ecapa_loss=0.000171, whisper_loss=0.09862, over 14537.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0112, ecapa_loss=0.0001825, whisper_loss=0.09241, over 3879681.10 frames. ], batch size: 57, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:30:10,744 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 02:30:21,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1412520.0, ans=0.2 2024-08-12 02:30:36,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-08-12 02:30:41,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1412720.0, ans=0.09899494936611666 2024-08-12 02:30:53,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1412720.0, ans=0.1 2024-08-12 02:30:56,331 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10850, loss[loss=0.08947, beats_loss=0.01252, ecapa_loss=0.0001928, whisper_loss=0.07502, over 19766.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01128, ecapa_loss=0.0001836, whisper_loss=0.09214, over 3838288.40 frames. ], batch size: 85, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:31:00,395 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.30 vs. limit=22.5 2024-08-12 02:31:00,957 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-12 02:31:04,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1412820.0, ans=22.5 2024-08-12 02:31:23,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1413020.0, ans=0.125 2024-08-12 02:31:31,378 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 02:31:33,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1413020.0, ans=0.2 2024-08-12 02:31:39,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.032e+01 2.708e+01 3.088e+01 3.544e+01 8.247e+01, threshold=6.177e+01, percent-clipped=2.0 2024-08-12 02:31:47,828 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 02:31:49,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1413120.0, ans=0.125 2024-08-12 02:32:06,808 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10900, loss[loss=0.09675, beats_loss=0.01094, ecapa_loss=0.0001492, whisper_loss=0.08431, over 17745.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01117, ecapa_loss=0.0001853, whisper_loss=0.09288, over 3861483.04 frames. ], batch size: 67, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:32:07,043 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 02:32:26,348 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.25 vs. limit=22.5 2024-08-12 02:32:27,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1413420.0, ans=0.125 2024-08-12 02:32:36,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1413520.0, ans=0.125 2024-08-12 02:33:04,671 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 02:33:07,249 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 36 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 02:33:07,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1413720.0, ans=0.0 2024-08-12 02:33:17,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1413820.0, ans=0.0 2024-08-12 02:33:18,379 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 10950, loss[loss=0.1014, beats_loss=0.01344, ecapa_loss=0.0001744, whisper_loss=0.0862, over 19310.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01117, ecapa_loss=0.0001844, whisper_loss=0.09301, over 3873931.99 frames. ], batch size: 75, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:33:18,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1413820.0, ans=0.2 2024-08-12 02:33:26,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1413820.0, ans=0.125 2024-08-12 02:33:33,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1413920.0, ans=0.125 2024-08-12 02:33:33,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1413920.0, ans=0.125 2024-08-12 02:33:48,676 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 02:33:52,706 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 02:34:00,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.632e+01 3.025e+01 3.424e+01 7.059e+01, threshold=6.051e+01, percent-clipped=1.0 2024-08-12 02:34:25,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1414220.0, ans=0.0 2024-08-12 02:34:27,535 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11000, loss[loss=0.1092, beats_loss=0.01126, ecapa_loss=0.0001652, whisper_loss=0.0963, over 15830.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01116, ecapa_loss=0.0001848, whisper_loss=0.09255, over 3882485.68 frames. ], batch size: 61, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:34:53,376 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 02:34:59,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1414520.0, ans=0.125 2024-08-12 02:35:05,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1414520.0, ans=0.125 2024-08-12 02:35:20,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1414620.0, ans=0.0 2024-08-12 02:35:26,403 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-12 02:35:33,146 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 02:35:35,800 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11050, loss[loss=0.0915, beats_loss=0.0128, ecapa_loss=0.0001454, whisper_loss=0.07725, over 18556.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01115, ecapa_loss=0.0001857, whisper_loss=0.0922, over 3902178.21 frames. ], batch size: 73, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:35:36,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1414820.0, ans=0.1 2024-08-12 02:35:45,203 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 02:35:50,656 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 02:35:54,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1414920.0, ans=0.2 2024-08-12 02:36:03,688 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.417e-03 2024-08-12 02:36:06,162 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 02:36:06,748 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2024-08-12 02:36:13,926 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-12 02:36:18,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.531e+01 2.878e+01 3.285e+01 6.916e+01, threshold=5.755e+01, percent-clipped=1.0 2024-08-12 02:36:28,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1415120.0, ans=0.0 2024-08-12 02:36:30,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1415220.0, ans=0.125 2024-08-12 02:36:36,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1415220.0, ans=0.125 2024-08-12 02:36:45,015 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11100, loss[loss=0.1129, beats_loss=0.009443, ecapa_loss=0.0002276, whisper_loss=0.1012, over 21097.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01113, ecapa_loss=0.0001872, whisper_loss=0.09253, over 3881606.58 frames. ], batch size: 89, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:36:51,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1415320.0, ans=0.125 2024-08-12 02:36:51,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1415320.0, ans=0.125 2024-08-12 02:36:52,246 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 02:37:14,180 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 02:37:20,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1415520.0, ans=0.0 2024-08-12 02:37:24,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1415520.0, ans=0.125 2024-08-12 02:37:35,452 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 02:37:46,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1415720.0, ans=0.125 2024-08-12 02:37:52,240 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 02:37:53,527 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 25 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 02:37:56,011 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11150, loss[loss=0.1277, beats_loss=0.007548, ecapa_loss=0.0001998, whisper_loss=0.1181, over 18094.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01113, ecapa_loss=0.0001852, whisper_loss=0.09272, over 3880350.91 frames. ], batch size: 69, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:38:00,362 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 02:38:20,430 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 02:38:30,250 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 02:38:38,885 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-12 02:38:39,441 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.570e+01 2.845e+01 3.196e+01 4.459e+01, threshold=5.690e+01, percent-clipped=0.0 2024-08-12 02:38:43,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1416120.0, ans=0.0 2024-08-12 02:38:57,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1416220.0, ans=0.125 2024-08-12 02:39:06,879 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11200, loss[loss=0.09977, beats_loss=0.01142, ecapa_loss=0.0002057, whisper_loss=0.08629, over 21979.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01109, ecapa_loss=0.0001851, whisper_loss=0.09246, over 3894013.98 frames. ], batch size: 88, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:39:10,336 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2024-08-12 02:39:30,046 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2024-08-12 02:39:43,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1416520.0, ans=0.125 2024-08-12 02:39:47,603 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=12.0 2024-08-12 02:39:54,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1416620.0, ans=0.125 2024-08-12 02:40:07,479 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 02:40:09,128 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 02:40:10,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1416720.0, ans=0.125 2024-08-12 02:40:10,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=15.0 2024-08-12 02:40:12,555 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 02:40:14,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1416720.0, ans=0.0 2024-08-12 02:40:16,502 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11250, loss[loss=0.09133, beats_loss=0.012, ecapa_loss=0.0001912, whisper_loss=0.07742, over 15356.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01109, ecapa_loss=0.0001871, whisper_loss=0.09238, over 3911429.89 frames. ], batch size: 63, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:40:20,973 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 02:40:22,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1416820.0, ans=0.125 2024-08-12 02:40:22,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1416820.0, ans=0.125 2024-08-12 02:40:22,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1416820.0, ans=0.025 2024-08-12 02:40:38,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1416920.0, ans=0.125 2024-08-12 02:40:48,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.57 vs. limit=15.0 2024-08-12 02:40:52,833 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 02:40:59,363 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.698e+01 3.076e+01 3.539e+01 6.948e+01, threshold=6.153e+01, percent-clipped=1.0 2024-08-12 02:41:00,054 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-12 02:41:18,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1417220.0, ans=0.2 2024-08-12 02:41:22,198 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-12 02:41:25,774 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11300, loss[loss=0.1158, beats_loss=0.008687, ecapa_loss=0.0001908, whisper_loss=0.1052, over 22078.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01098, ecapa_loss=0.0001882, whisper_loss=0.09261, over 3896707.23 frames. ], batch size: 85, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:41:31,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1417320.0, ans=0.0 2024-08-12 02:41:33,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1417320.0, ans=0.125 2024-08-12 02:41:34,873 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 02:41:43,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1417420.0, ans=0.2 2024-08-12 02:42:10,002 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-12 02:42:28,471 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2024-08-12 02:42:35,639 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11350, loss[loss=0.07717, beats_loss=0.01052, ecapa_loss=0.0001972, whisper_loss=0.06468, over 14398.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01096, ecapa_loss=0.000188, whisper_loss=0.09275, over 3873207.79 frames. ], batch size: 58, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:42:37,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1417820.0, ans=0.0 2024-08-12 02:42:40,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1417820.0, ans=0.125 2024-08-12 02:42:41,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1417820.0, ans=0.0 2024-08-12 02:42:45,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1417820.0, ans=0.125 2024-08-12 02:42:53,468 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 02:43:03,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1418020.0, ans=0.125 2024-08-12 02:43:03,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1418020.0, ans=0.125 2024-08-12 02:43:18,343 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.545e+01 2.820e+01 3.202e+01 5.315e+01, threshold=5.639e+01, percent-clipped=0.0 2024-08-12 02:43:27,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1418120.0, ans=0.2 2024-08-12 02:43:28,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1418120.0, ans=0.2 2024-08-12 02:43:32,590 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 02:43:35,555 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 02:43:39,785 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 02:43:43,865 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 02:43:45,031 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11400, loss[loss=0.1004, beats_loss=0.01034, ecapa_loss=0.0001818, whisper_loss=0.0882, over 19079.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01091, ecapa_loss=0.0001875, whisper_loss=0.09353, over 3843307.45 frames. ], batch size: 75, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:43:59,185 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 29 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 02:44:11,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1418520.0, ans=0.95 2024-08-12 02:44:11,987 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2024-08-12 02:44:13,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1418520.0, ans=0.05 2024-08-12 02:44:16,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1418520.0, ans=0.07 2024-08-12 02:44:21,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1418520.0, ans=0.1 2024-08-12 02:44:37,477 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 14 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 02:44:39,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1418720.0, ans=0.0 2024-08-12 02:44:53,374 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11450, loss[loss=0.1131, beats_loss=0.008818, ecapa_loss=0.0001788, whisper_loss=0.1024, over 18470.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01095, ecapa_loss=0.0001869, whisper_loss=0.09268, over 3844834.24 frames. ], batch size: 71, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:45:08,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1418920.0, ans=0.125 2024-08-12 02:45:17,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1418920.0, ans=0.125 2024-08-12 02:45:18,594 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 02:45:23,042 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.468e+00 2024-08-12 02:45:36,289 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.629e+01 3.024e+01 3.484e+01 5.992e+01, threshold=6.048e+01, percent-clipped=1.0 2024-08-12 02:45:38,868 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=15.0 2024-08-12 02:45:49,085 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 02:45:56,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1419220.0, ans=0.05 2024-08-12 02:46:02,695 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11500, loss[loss=0.1149, beats_loss=0.00945, ecapa_loss=0.0002226, whisper_loss=0.1032, over 17298.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01107, ecapa_loss=0.0001854, whisper_loss=0.09254, over 3847260.49 frames. ], batch size: 72, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:46:08,165 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 02:46:11,794 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 02:46:24,181 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 02:46:30,892 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 02:46:31,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1419520.0, ans=0.125 2024-08-12 02:46:37,164 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-12 02:46:37,684 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 02:47:10,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1419820.0, ans=0.07 2024-08-12 02:47:11,219 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11550, loss[loss=0.1004, beats_loss=0.00958, ecapa_loss=0.0001882, whisper_loss=0.0889, over 16615.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01109, ecapa_loss=0.0001859, whisper_loss=0.09231, over 3865143.74 frames. ], batch size: 66, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:47:11,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1419820.0, ans=0.0 2024-08-12 02:47:22,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1419820.0, ans=0.1 2024-08-12 02:47:41,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1420020.0, ans=0.1 2024-08-12 02:47:53,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.668e+01 3.016e+01 3.497e+01 6.036e+01, threshold=6.031e+01, percent-clipped=0.0 2024-08-12 02:47:58,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1420120.0, ans=0.2 2024-08-12 02:48:02,408 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 02:48:19,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1420320.0, ans=0.1 2024-08-12 02:48:20,585 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11600, loss[loss=0.1226, beats_loss=0.008793, ecapa_loss=0.0002571, whisper_loss=0.1112, over 20139.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01101, ecapa_loss=0.0001864, whisper_loss=0.09253, over 3880134.84 frames. ], batch size: 89, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:48:39,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1420420.0, ans=0.1 2024-08-12 02:48:46,081 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 02:49:07,791 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-12 02:49:11,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1420620.0, ans=0.1 2024-08-12 02:49:11,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1420620.0, ans=0.125 2024-08-12 02:49:14,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1420720.0, ans=0.125 2024-08-12 02:49:24,554 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-08-12 02:49:27,128 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 02:49:29,597 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11650, loss[loss=0.1112, beats_loss=0.01041, ecapa_loss=0.0002015, whisper_loss=0.09874, over 21923.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.011, ecapa_loss=0.000186, whisper_loss=0.09305, over 3897781.78 frames. ], batch size: 84, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:49:34,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1420820.0, ans=0.0 2024-08-12 02:49:42,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1420920.0, ans=0.2 2024-08-12 02:49:43,501 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-12 02:49:44,753 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 02:49:55,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1421020.0, ans=0.04949747468305833 2024-08-12 02:50:12,273 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.632e+01 2.905e+01 3.202e+01 4.413e+01, threshold=5.810e+01, percent-clipped=0.0 2024-08-12 02:50:21,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1421120.0, ans=0.125 2024-08-12 02:50:26,026 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 02:50:31,149 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.59 vs. limit=15.0 2024-08-12 02:50:38,279 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11700, loss[loss=0.09133, beats_loss=0.01277, ecapa_loss=0.0001565, whisper_loss=0.07699, over 15553.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01111, ecapa_loss=0.0001836, whisper_loss=0.0934, over 3894308.78 frames. ], batch size: 61, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:50:39,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.78 vs. limit=22.5 2024-08-12 02:50:39,105 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-12 02:50:40,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1421320.0, ans=0.2 2024-08-12 02:50:49,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1421320.0, ans=0.125 2024-08-12 02:50:56,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1421420.0, ans=0.125 2024-08-12 02:51:00,230 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 02:51:02,980 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 02:51:20,860 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-12 02:51:38,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1421720.0, ans=0.2 2024-08-12 02:51:39,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1421720.0, ans=0.1 2024-08-12 02:51:40,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1421720.0, ans=0.0 2024-08-12 02:51:42,926 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 02:51:46,574 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11750, loss[loss=0.1048, beats_loss=0.01161, ecapa_loss=0.0001902, whisper_loss=0.09134, over 22113.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01121, ecapa_loss=0.0001822, whisper_loss=0.09302, over 3920128.56 frames. ], batch size: 90, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:51:47,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1421820.0, ans=0.0 2024-08-12 02:51:50,012 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 02:52:13,050 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 02:52:26,818 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 02:52:28,084 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 30 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 02:52:29,184 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.531e+01 2.844e+01 3.355e+01 7.826e+01, threshold=5.688e+01, percent-clipped=1.0 2024-08-12 02:52:55,142 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11800, loss[loss=0.1103, beats_loss=0.01019, ecapa_loss=0.0001579, whisper_loss=0.09854, over 24136.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01119, ecapa_loss=0.0001815, whisper_loss=0.09257, over 3870374.67 frames. ], batch size: 92, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:53:03,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1422320.0, ans=0.125 2024-08-12 02:53:25,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1422520.0, ans=0.2 2024-08-12 02:53:25,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1422520.0, ans=0.125 2024-08-12 02:53:38,559 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 02:53:57,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1422720.0, ans=0.1 2024-08-12 02:54:04,592 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11850, loss[loss=0.1192, beats_loss=0.01101, ecapa_loss=0.0001711, whisper_loss=0.1065, over 23108.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01126, ecapa_loss=0.0001809, whisper_loss=0.09265, over 3898593.03 frames. ], batch size: 93, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:54:41,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1423020.0, ans=0.0 2024-08-12 02:54:47,403 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.632e+01 2.955e+01 3.333e+01 2.077e+02, threshold=5.910e+01, percent-clipped=1.0 2024-08-12 02:55:01,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1423220.0, ans=0.09899494936611666 2024-08-12 02:55:09,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1423220.0, ans=0.125 2024-08-12 02:55:12,388 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11900, loss[loss=0.09959, beats_loss=0.008451, ecapa_loss=0.0002535, whisper_loss=0.0886, over 18156.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01131, ecapa_loss=0.0001818, whisper_loss=0.0923, over 3922589.55 frames. ], batch size: 77, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:55:12,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1423320.0, ans=0.125 2024-08-12 02:55:18,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.62 vs. limit=22.5 2024-08-12 02:55:23,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1423320.0, ans=15.0 2024-08-12 02:55:26,614 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 02:55:32,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1423420.0, ans=0.125 2024-08-12 02:55:37,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1423420.0, ans=0.1 2024-08-12 02:55:56,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1423620.0, ans=0.125 2024-08-12 02:56:00,202 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-12 02:56:02,591 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.97 vs. limit=5.0 2024-08-12 02:56:03,049 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 02:56:12,970 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 02:56:20,003 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 02:56:22,513 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 11950, loss[loss=0.09916, beats_loss=0.01189, ecapa_loss=0.000159, whisper_loss=0.08569, over 22842.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01127, ecapa_loss=0.0001824, whisper_loss=0.09187, over 3915438.93 frames. ], batch size: 90, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:56:27,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1423820.0, ans=0.125 2024-08-12 02:56:28,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1423820.0, ans=0.125 2024-08-12 02:56:35,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1423920.0, ans=0.0 2024-08-12 02:56:38,545 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.99 vs. limit=22.5 2024-08-12 02:56:43,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1423920.0, ans=0.0 2024-08-12 02:56:53,436 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 02:56:59,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1424020.0, ans=0.125 2024-08-12 02:56:59,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1424020.0, ans=0.125 2024-08-12 02:57:06,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.496e+01 2.723e+01 3.288e+01 6.365e+01, threshold=5.445e+01, percent-clipped=1.0 2024-08-12 02:57:06,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1424120.0, ans=0.125 2024-08-12 02:57:10,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1424120.0, ans=0.125 2024-08-12 02:57:14,587 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 02:57:19,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1424220.0, ans=0.07 2024-08-12 02:57:25,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1424220.0, ans=0.0 2024-08-12 02:57:25,771 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.34 vs. limit=15.0 2024-08-12 02:57:31,488 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12000, loss[loss=0.1046, beats_loss=0.01139, ecapa_loss=0.0001594, whisper_loss=0.0916, over 19781.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01122, ecapa_loss=0.0001833, whisper_loss=0.09196, over 3894737.84 frames. ], batch size: 75, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:57:31,488 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 02:58:10,738 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on ASR_libri: loss=0.2556, beats_loss=0, ecapa_loss=0.0006161, whisper_loss=0.2495, over 922467.00 frames. 2024-08-12 02:58:28,738 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on SV_voxceleb1: loss=0.005027, beats_loss=0, ecapa_loss=0.0005027, whisper_loss=0, over 939242.00 frames. 2024-08-12 03:00:26,462 INFO [train_multi_KD3.py:1149] (1/4) Epoch 10, validation on AT_audioset: loss=0.02469, beats_loss=0.02469, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 03:00:26,466 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 03:00:29,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1424320.0, ans=0.05 2024-08-12 03:00:42,927 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-12 03:00:44,213 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 03:00:45,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1424420.0, ans=0.125 2024-08-12 03:00:48,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1424420.0, ans=0.125 2024-08-12 03:00:52,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1424520.0, ans=0.2 2024-08-12 03:00:55,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1424520.0, ans=0.125 2024-08-12 03:00:57,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1424520.0, ans=0.0 2024-08-12 03:01:09,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1424620.0, ans=0.0 2024-08-12 03:01:10,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1424620.0, ans=0.125 2024-08-12 03:01:15,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1424620.0, ans=0.0 2024-08-12 03:01:26,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1424720.0, ans=0.0 2024-08-12 03:01:36,060 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12050, loss[loss=0.09352, beats_loss=0.01089, ecapa_loss=0.0001583, whisper_loss=0.08105, over 18276.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01121, ecapa_loss=0.0001843, whisper_loss=0.09151, over 3837399.31 frames. ], batch size: 73, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:01:36,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1424820.0, ans=0.125 2024-08-12 03:01:50,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1424920.0, ans=0.2 2024-08-12 03:01:51,547 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 03:02:00,219 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-12 03:02:19,393 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.644e+01 2.915e+01 3.248e+01 4.728e+01, threshold=5.830e+01, percent-clipped=0.0 2024-08-12 03:02:36,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1425220.0, ans=0.0 2024-08-12 03:02:45,884 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12100, loss[loss=0.0919, beats_loss=0.009198, ecapa_loss=0.0002205, whisper_loss=0.08049, over 15830.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01117, ecapa_loss=0.0001836, whisper_loss=0.09057, over 3815968.43 frames. ], batch size: 67, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:02:46,158 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-12 03:02:58,571 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 03:03:09,553 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 03:03:15,443 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=12.0 2024-08-12 03:03:26,603 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2024-08-12 03:03:27,229 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 03:03:48,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1425720.0, ans=0.0 2024-08-12 03:03:48,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1425720.0, ans=0.1 2024-08-12 03:03:55,003 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12150, loss[loss=0.09264, beats_loss=0.01384, ecapa_loss=0.0001526, whisper_loss=0.07727, over 16702.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01105, ecapa_loss=0.0001856, whisper_loss=0.09156, over 3835545.65 frames. ], batch size: 67, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:04:03,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1425820.0, ans=0.1 2024-08-12 03:04:12,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1425920.0, ans=0.0 2024-08-12 03:04:23,020 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 11 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 03:04:35,602 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 18 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-12 03:04:38,042 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.674e+01 3.067e+01 3.443e+01 6.340e+01, threshold=6.135e+01, percent-clipped=1.0 2024-08-12 03:04:58,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1426220.0, ans=0.125 2024-08-12 03:05:04,293 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12200, loss[loss=0.1066, beats_loss=0.008369, ecapa_loss=0.0002114, whisper_loss=0.0961, over 18524.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01098, ecapa_loss=0.0001858, whisper_loss=0.09209, over 3842790.86 frames. ], batch size: 71, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:05:46,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1426620.0, ans=0.1 2024-08-12 03:05:49,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1426620.0, ans=0.125 2024-08-12 03:05:59,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1426720.0, ans=0.125 2024-08-12 03:05:59,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1426720.0, ans=0.0 2024-08-12 03:06:05,188 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2024-08-12 03:06:07,526 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-12 03:06:13,041 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12250, loss[loss=0.1127, beats_loss=0.01359, ecapa_loss=0.0001856, whisper_loss=0.09724, over 22627.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01102, ecapa_loss=0.000185, whisper_loss=0.09173, over 3816171.37 frames. ], batch size: 90, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:06:44,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1427020.0, ans=0.0 2024-08-12 03:06:56,581 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.672e+01 2.930e+01 3.249e+01 5.324e+01, threshold=5.861e+01, percent-clipped=0.0 2024-08-12 03:07:01,268 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.730e+05 2024-08-12 03:07:13,921 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 03:07:23,272 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12300, loss[loss=0.08423, beats_loss=0.01572, ecapa_loss=0.0001375, whisper_loss=0.06713, over 16355.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0111, ecapa_loss=0.0001848, whisper_loss=0.09182, over 3846640.23 frames. ], batch size: 64, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:07:24,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1427320.0, ans=0.125 2024-08-12 03:07:37,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1427420.0, ans=0.1 2024-08-12 03:07:50,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1427520.0, ans=0.125 2024-08-12 03:08:03,834 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.60 vs. limit=10.0 2024-08-12 03:08:28,151 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=12.0 2024-08-12 03:08:32,613 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12350, loss[loss=0.1058, beats_loss=0.01146, ecapa_loss=0.0001636, whisper_loss=0.09268, over 17157.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01105, ecapa_loss=0.0001869, whisper_loss=0.09181, over 3853890.46 frames. ], batch size: 67, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:08:35,914 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 03:08:53,790 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 12 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-12 03:09:08,737 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 03:09:18,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.674e+01 3.021e+01 3.383e+01 7.125e+01, threshold=6.043e+01, percent-clipped=2.0 2024-08-12 03:09:24,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1428120.0, ans=0.2 2024-08-12 03:09:34,878 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 31 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-12 03:09:38,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1428220.0, ans=0.0 2024-08-12 03:09:43,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1428220.0, ans=0.2 2024-08-12 03:09:48,000 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12400, loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001942, whisper_loss=0.09017, over 20328.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01099, ecapa_loss=0.0001867, whisper_loss=0.09256, over 3878939.70 frames. ], batch size: 81, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:09:52,393 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2024-08-12 03:09:58,176 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 03:10:06,570 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 03:10:16,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1428520.0, ans=0.0 2024-08-12 03:10:25,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1428520.0, ans=0.0 2024-08-12 03:10:27,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1428520.0, ans=0.0 2024-08-12 03:10:30,374 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:10:30,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1428520.0, ans=0.125 2024-08-12 03:10:32,683 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-12 03:10:32,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1428620.0, ans=0.0 2024-08-12 03:11:02,563 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12450, loss[loss=0.1188, beats_loss=0.01084, ecapa_loss=0.0001702, whisper_loss=0.1063, over 24119.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01095, ecapa_loss=0.0001854, whisper_loss=0.0926, over 3888894.57 frames. ], batch size: 94, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:11:08,270 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 10 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 03:11:10,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1428820.0, ans=0.2 2024-08-12 03:11:32,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1429020.0, ans=0.0 2024-08-12 03:11:41,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1429020.0, ans=0.0 2024-08-12 03:11:46,534 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.648e+01 2.502e+01 2.764e+01 3.282e+01 5.590e+01, threshold=5.528e+01, percent-clipped=0.0 2024-08-12 03:12:01,732 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 03:12:11,312 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.457e+02 2024-08-12 03:12:12,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1429220.0, ans=0.1 2024-08-12 03:12:14,739 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12500, loss[loss=0.1311, beats_loss=0.01031, ecapa_loss=0.0001742, whisper_loss=0.119, over 23143.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01095, ecapa_loss=0.0001847, whisper_loss=0.09341, over 3889635.24 frames. ], batch size: 89, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:12:26,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1429320.0, ans=0.0 2024-08-12 03:12:38,929 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-12 03:12:48,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1429520.0, ans=0.125 2024-08-12 03:12:52,616 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-12 03:13:00,056 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 03:13:04,343 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 03:13:05,844 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 19 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 03:13:07,366 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 03:13:24,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1429720.0, ans=0.0 2024-08-12 03:13:27,048 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12550, loss[loss=0.131, beats_loss=0.01038, ecapa_loss=0.0001864, whisper_loss=0.1187, over 17673.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01099, ecapa_loss=0.0001844, whisper_loss=0.09322, over 3886972.45 frames. ], batch size: 69, lr: 6.20e-03, grad_scale: 2.305843009213694e+18 2024-08-12 03:13:29,842 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 03:13:32,700 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 03:13:40,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1429920.0, ans=0.1 2024-08-12 03:13:53,802 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 03:14:02,209 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 03:14:08,161 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 32 from Vox, 23 fro AS 2024-08-12 03:14:12,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.663e+01 2.938e+01 3.317e+01 5.229e+01, threshold=5.876e+01, percent-clipped=0.0 2024-08-12 03:14:23,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1430220.0, ans=0.125 2024-08-12 03:14:28,627 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 03:14:38,716 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12600, loss[loss=0.1069, beats_loss=0.01115, ecapa_loss=0.0001549, whisper_loss=0.09425, over 20891.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01101, ecapa_loss=0.000184, whisper_loss=0.09387, over 3905115.48 frames. ], batch size: 79, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:14:40,928 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 27 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 03:15:10,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1430520.0, ans=0.125 2024-08-12 03:15:13,621 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 03:15:43,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1430720.0, ans=0.0 2024-08-12 03:15:52,257 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12650, loss[loss=0.1068, beats_loss=0.01214, ecapa_loss=0.0001723, whisper_loss=0.09295, over 22024.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0111, ecapa_loss=0.0001841, whisper_loss=0.09365, over 3901399.15 frames. ], batch size: 89, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:15:55,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1430820.0, ans=0.125 2024-08-12 03:16:03,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1430820.0, ans=0.2 2024-08-12 03:16:11,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1430920.0, ans=0.1 2024-08-12 03:16:38,969 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.672e+01 3.119e+01 3.630e+01 6.657e+01, threshold=6.239e+01, percent-clipped=2.0 2024-08-12 03:16:47,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1431120.0, ans=0.025 2024-08-12 03:16:58,980 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 03:17:05,573 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12700, loss[loss=0.1033, beats_loss=0.006844, ecapa_loss=0.000269, whisper_loss=0.09372, over 13806.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01116, ecapa_loss=0.0001841, whisper_loss=0.09345, over 3894308.61 frames. ], batch size: 62, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:17:42,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1431520.0, ans=0.2 2024-08-12 03:18:01,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1431620.0, ans=0.125 2024-08-12 03:18:17,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.16 vs. limit=22.5 2024-08-12 03:18:18,312 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12750, loss[loss=0.1123, beats_loss=0.01203, ecapa_loss=0.0002236, whisper_loss=0.09801, over 17337.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01111, ecapa_loss=0.0001831, whisper_loss=0.09401, over 3917485.46 frames. ], batch size: 71, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:18:18,557 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 29 from Vox, 19 fro AS 2024-08-12 03:18:30,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1431820.0, ans=0.125 2024-08-12 03:18:40,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1431920.0, ans=0.125 2024-08-12 03:18:46,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1432020.0, ans=0.2 2024-08-12 03:18:50,265 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:19:02,877 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.558e+01 2.840e+01 3.489e+01 4.506e+01, threshold=5.680e+01, percent-clipped=0.0 2024-08-12 03:19:25,716 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=15.0 2024-08-12 03:19:29,716 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12800, loss[loss=0.1133, beats_loss=0.01172, ecapa_loss=0.0001882, whisper_loss=0.09965, over 22087.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01112, ecapa_loss=0.0001827, whisper_loss=0.09344, over 3907219.33 frames. ], batch size: 92, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:19:34,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1432320.0, ans=0.125 2024-08-12 03:19:34,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1432320.0, ans=10.0 2024-08-12 03:19:38,822 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.653e+05 2024-08-12 03:20:09,010 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 28 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 03:20:12,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1432620.0, ans=0.07 2024-08-12 03:20:13,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.71 vs. limit=10.0 2024-08-12 03:20:27,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1432720.0, ans=0.125 2024-08-12 03:20:31,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1432720.0, ans=0.0 2024-08-12 03:20:33,953 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 03:20:35,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1432720.0, ans=0.2 2024-08-12 03:20:38,087 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 03:20:39,361 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12850, loss[loss=0.1172, beats_loss=0.01186, ecapa_loss=0.000193, whisper_loss=0.1035, over 15438.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01115, ecapa_loss=0.000183, whisper_loss=0.09292, over 3889888.48 frames. ], batch size: 59, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:20:39,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1432820.0, ans=0.125 2024-08-12 03:20:43,913 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 03:20:54,887 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 03:20:55,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1432920.0, ans=0.0 2024-08-12 03:21:03,383 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 03:21:22,985 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-12 03:21:23,441 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.483e+01 2.799e+01 3.175e+01 4.760e+01, threshold=5.599e+01, percent-clipped=0.0 2024-08-12 03:21:29,473 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 03:21:42,192 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2024-08-12 03:21:44,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1433220.0, ans=0.2 2024-08-12 03:21:48,364 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12900, loss[loss=0.1247, beats_loss=0.01088, ecapa_loss=0.0001524, whisper_loss=0.1123, over 16738.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01114, ecapa_loss=0.0001831, whisper_loss=0.09296, over 3871568.39 frames. ], batch size: 64, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:22:28,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1433620.0, ans=0.125 2024-08-12 03:22:45,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1433720.0, ans=0.125 2024-08-12 03:22:52,940 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 30 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 03:22:58,936 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 12950, loss[loss=0.1209, beats_loss=0.01077, ecapa_loss=0.0001845, whisper_loss=0.1083, over 22997.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01104, ecapa_loss=0.0001834, whisper_loss=0.09369, over 3853558.85 frames. ], batch size: 90, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:22:59,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1433820.0, ans=0.125 2024-08-12 03:23:03,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1433820.0, ans=0.0 2024-08-12 03:23:03,704 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.74 vs. limit=22.5 2024-08-12 03:23:12,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1433920.0, ans=0.0 2024-08-12 03:23:15,889 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2024-08-12 03:23:18,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1433920.0, ans=0.1 2024-08-12 03:23:19,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1433920.0, ans=0.05 2024-08-12 03:23:21,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1433920.0, ans=0.0 2024-08-12 03:23:45,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.584e+01 3.018e+01 3.555e+01 5.734e+01, threshold=6.036e+01, percent-clipped=3.0 2024-08-12 03:23:46,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1434120.0, ans=0.1 2024-08-12 03:24:09,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1434220.0, ans=0.125 2024-08-12 03:24:10,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1434320.0, ans=0.2 2024-08-12 03:24:11,211 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13000, loss[loss=0.1054, beats_loss=0.00983, ecapa_loss=0.0001741, whisper_loss=0.09387, over 21533.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.011, ecapa_loss=0.0001844, whisper_loss=0.09367, over 3875920.52 frames. ], batch size: 86, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:24:23,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1434320.0, ans=0.1 2024-08-12 03:24:32,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1434420.0, ans=0.0 2024-08-12 03:24:37,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1434420.0, ans=0.125 2024-08-12 03:24:39,863 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.688e+00 2024-08-12 03:24:40,953 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-12 03:24:48,574 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 03:24:52,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1434520.0, ans=0.2 2024-08-12 03:25:02,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1434620.0, ans=0.125 2024-08-12 03:25:06,574 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 03:25:06,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1434620.0, ans=0.1 2024-08-12 03:25:18,379 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 20 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 03:25:24,291 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13050, loss[loss=0.1172, beats_loss=0.01337, ecapa_loss=0.0001382, whisper_loss=0.1025, over 23317.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0111, ecapa_loss=0.0001826, whisper_loss=0.09338, over 3889807.97 frames. ], batch size: 89, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:25:42,423 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 03:25:50,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=1434920.0, ans=0.1 2024-08-12 03:25:53,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1435020.0, ans=0.2 2024-08-12 03:25:56,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1435020.0, ans=0.125 2024-08-12 03:25:59,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1435020.0, ans=0.125 2024-08-12 03:26:12,498 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.574e+01 2.930e+01 3.375e+01 4.949e+01, threshold=5.859e+01, percent-clipped=0.0 2024-08-12 03:26:24,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1435220.0, ans=0.1 2024-08-12 03:26:29,060 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 03:26:41,672 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13100, loss[loss=0.1051, beats_loss=0.01153, ecapa_loss=0.0001718, whisper_loss=0.09183, over 19045.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01111, ecapa_loss=0.0001817, whisper_loss=0.0938, over 3886668.01 frames. ], batch size: 78, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:26:47,600 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.70 vs. limit=22.5 2024-08-12 03:26:50,425 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-12 03:26:53,915 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-12 03:27:07,257 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 03:27:11,851 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 03:27:34,185 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 03:27:41,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1435720.0, ans=0.125 2024-08-12 03:27:46,752 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 03:27:56,457 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13150, loss[loss=0.1081, beats_loss=0.009694, ecapa_loss=0.0002019, whisper_loss=0.09636, over 18945.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01113, ecapa_loss=0.0001813, whisper_loss=0.09281, over 3863749.01 frames. ], batch size: 74, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:27:58,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1435820.0, ans=0.025 2024-08-12 03:28:11,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1435920.0, ans=0.125 2024-08-12 03:28:33,377 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 03:28:42,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1436120.0, ans=0.0 2024-08-12 03:28:43,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.467e+01 2.835e+01 3.173e+01 4.953e+01, threshold=5.670e+01, percent-clipped=0.0 2024-08-12 03:28:54,298 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 03:29:08,663 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13200, loss[loss=0.1126, beats_loss=0.01129, ecapa_loss=0.0002095, whisper_loss=0.09919, over 20506.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01117, ecapa_loss=0.0001819, whisper_loss=0.09194, over 3828599.51 frames. ], batch size: 84, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:29:12,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1436320.0, ans=0.0 2024-08-12 03:29:18,280 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2024-08-12 03:29:21,920 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 27 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-12 03:29:30,548 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 03:29:53,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1436620.0, ans=0.0 2024-08-12 03:30:03,925 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 03:30:12,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1436720.0, ans=0.1 2024-08-12 03:30:22,590 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13250, loss[loss=0.1064, beats_loss=0.01112, ecapa_loss=0.0001905, whisper_loss=0.09333, over 21796.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01112, ecapa_loss=0.0001823, whisper_loss=0.09201, over 3852569.71 frames. ], batch size: 89, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:30:24,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1436820.0, ans=0.2 2024-08-12 03:30:34,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1436820.0, ans=0.0 2024-08-12 03:30:39,605 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 25 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-12 03:30:39,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1436920.0, ans=0.1 2024-08-12 03:30:47,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1436920.0, ans=0.0 2024-08-12 03:30:48,071 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 03:30:50,712 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.77 vs. limit=6.0 2024-08-12 03:31:01,474 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 03:31:03,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1437020.0, ans=0.1 2024-08-12 03:31:07,327 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 03:31:10,207 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.496e+01 2.755e+01 3.152e+01 5.278e+01, threshold=5.510e+01, percent-clipped=0.0 2024-08-12 03:31:15,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1437120.0, ans=0.0 2024-08-12 03:31:17,695 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.23 vs. limit=15.0 2024-08-12 03:31:24,828 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-12 03:31:32,802 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 03:31:36,219 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 31 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 03:31:37,493 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13300, loss[loss=0.1134, beats_loss=0.008508, ecapa_loss=0.0001985, whisper_loss=0.1029, over 20099.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01113, ecapa_loss=0.0001826, whisper_loss=0.09171, over 3829244.21 frames. ], batch size: 82, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:31:38,365 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=12.0 2024-08-12 03:32:13,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1437520.0, ans=15.0 2024-08-12 03:32:21,590 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 03:32:32,335 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 03:32:50,783 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13350, loss[loss=0.1071, beats_loss=0.01226, ecapa_loss=0.0001964, whisper_loss=0.09292, over 20627.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01109, ecapa_loss=0.0001828, whisper_loss=0.09206, over 3844432.37 frames. ], batch size: 88, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:32:54,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1437820.0, ans=0.1 2024-08-12 03:33:09,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1437920.0, ans=10.0 2024-08-12 03:33:11,175 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:33:22,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1438020.0, ans=0.0 2024-08-12 03:33:25,525 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 03:33:33,887 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-12 03:33:36,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1438120.0, ans=0.125 2024-08-12 03:33:37,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.604e+01 2.851e+01 3.185e+01 1.772e+02, threshold=5.702e+01, percent-clipped=1.0 2024-08-12 03:33:38,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1438120.0, ans=0.0 2024-08-12 03:33:51,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1438220.0, ans=0.125 2024-08-12 03:34:04,017 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13400, loss[loss=0.1045, beats_loss=0.01301, ecapa_loss=0.0001827, whisper_loss=0.08966, over 16482.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01109, ecapa_loss=0.0001828, whisper_loss=0.09216, over 3838359.51 frames. ], batch size: 69, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:34:08,709 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 03:34:17,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1438420.0, ans=0.1 2024-08-12 03:34:45,555 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 03:34:49,292 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 03:34:51,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1438620.0, ans=0.125 2024-08-12 03:35:10,577 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 03:35:10,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1438720.0, ans=0.125 2024-08-12 03:35:13,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1438720.0, ans=0.0 2024-08-12 03:35:15,846 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13450, loss[loss=0.115, beats_loss=0.01205, ecapa_loss=0.0001882, whisper_loss=0.1011, over 22346.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01114, ecapa_loss=0.0001834, whisper_loss=0.09183, over 3848764.46 frames. ], batch size: 93, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:35:24,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1438820.0, ans=0.0 2024-08-12 03:35:25,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1438820.0, ans=0.0 2024-08-12 03:35:39,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1438920.0, ans=0.0 2024-08-12 03:36:01,730 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.74 vs. limit=10.0 2024-08-12 03:36:02,174 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.531e+01 2.871e+01 3.206e+01 5.320e+01, threshold=5.741e+01, percent-clipped=0.0 2024-08-12 03:36:20,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1439220.0, ans=0.0 2024-08-12 03:36:27,282 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2024-08-12 03:36:28,088 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 03:36:29,195 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13500, loss[loss=0.1014, beats_loss=0.01256, ecapa_loss=0.0001482, whisper_loss=0.08732, over 22014.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01109, ecapa_loss=0.0001825, whisper_loss=0.09228, over 3882027.81 frames. ], batch size: 84, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:36:31,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1439320.0, ans=0.125 2024-08-12 03:37:10,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1439520.0, ans=0.0 2024-08-12 03:37:26,716 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2024-08-12 03:37:41,296 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13550, loss[loss=0.07421, beats_loss=0.01317, ecapa_loss=0.0001587, whisper_loss=0.05945, over 16822.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01104, ecapa_loss=0.000183, whisper_loss=0.09272, over 3884407.59 frames. ], batch size: 68, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:37:41,523 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 03:37:51,331 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2024-08-12 03:38:10,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1440020.0, ans=0.1 2024-08-12 03:38:19,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1440020.0, ans=0.125 2024-08-12 03:38:22,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.39 vs. limit=15.0 2024-08-12 03:38:25,045 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 03:38:28,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.568e+01 2.866e+01 3.422e+01 5.610e+01, threshold=5.733e+01, percent-clipped=0.0 2024-08-12 03:38:32,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1440120.0, ans=0.0 2024-08-12 03:38:35,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1440120.0, ans=0.0 2024-08-12 03:38:36,766 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 03:38:51,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1440220.0, ans=0.125 2024-08-12 03:38:52,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1440320.0, ans=0.2 2024-08-12 03:38:53,349 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13600, loss[loss=0.1088, beats_loss=0.01192, ecapa_loss=0.0001853, whisper_loss=0.09502, over 22003.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01106, ecapa_loss=0.0001836, whisper_loss=0.09259, over 3867381.72 frames. ], batch size: 90, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:38:59,387 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 03:39:23,924 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 03:39:33,176 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-08-12 03:39:34,327 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-12 03:39:38,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1440620.0, ans=0.125 2024-08-12 03:39:40,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1440620.0, ans=0.125 2024-08-12 03:39:51,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1440720.0, ans=0.125 2024-08-12 03:39:59,973 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 03:40:05,135 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13650, loss[loss=0.09638, beats_loss=0.01297, ecapa_loss=0.0002244, whisper_loss=0.08116, over 18002.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01125, ecapa_loss=0.0001826, whisper_loss=0.09136, over 3859178.06 frames. ], batch size: 78, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:40:21,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1440920.0, ans=0.125 2024-08-12 03:40:31,711 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 23 from LS+wenet, 35 from Vox, 38 fro AS 2024-08-12 03:40:33,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1441020.0, ans=0.2 2024-08-12 03:40:40,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1441020.0, ans=0.125 2024-08-12 03:40:50,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.520e+01 2.826e+01 3.243e+01 5.319e+01, threshold=5.652e+01, percent-clipped=0.0 2024-08-12 03:40:51,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-12 03:41:11,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1441220.0, ans=0.2 2024-08-12 03:41:17,322 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13700, loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001793, whisper_loss=0.09044, over 16068.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01121, ecapa_loss=0.0001842, whisper_loss=0.09159, over 3864483.45 frames. ], batch size: 62, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:41:19,213 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 17 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 03:41:24,582 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 03:41:33,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1441420.0, ans=0.1 2024-08-12 03:41:36,014 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 03:41:38,771 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 03:41:40,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1441420.0, ans=0.125 2024-08-12 03:41:41,934 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-08-12 03:41:52,189 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-12 03:42:03,891 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 03:42:11,021 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 03:42:12,371 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 15 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-12 03:42:25,720 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2024-08-12 03:42:26,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1441820.0, ans=0.125 2024-08-12 03:42:27,324 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13750, loss[loss=0.1021, beats_loss=0.01199, ecapa_loss=0.0001879, whisper_loss=0.08824, over 22642.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0112, ecapa_loss=0.0001837, whisper_loss=0.0916, over 3852532.94 frames. ], batch size: 93, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:42:53,265 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 30 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 03:43:06,297 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 03:43:09,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1442120.0, ans=0.125 2024-08-12 03:43:10,969 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.18 vs. limit=15.0 2024-08-12 03:43:11,925 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.531e+01 2.738e+01 3.278e+01 4.185e+01, threshold=5.475e+01, percent-clipped=0.0 2024-08-12 03:43:21,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1442120.0, ans=0.125 2024-08-12 03:43:25,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1442220.0, ans=0.125 2024-08-12 03:43:33,621 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2024-08-12 03:43:38,567 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13800, loss[loss=0.1037, beats_loss=0.01196, ecapa_loss=0.0001676, whisper_loss=0.09009, over 23631.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01116, ecapa_loss=0.0001841, whisper_loss=0.09202, over 3840930.27 frames. ], batch size: 91, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:43:44,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1442320.0, ans=0.125 2024-08-12 03:44:41,788 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 28 from LS+wenet, 26 from Vox, 18 fro AS 2024-08-12 03:44:46,839 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=15.0 2024-08-12 03:44:48,799 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 03:44:51,526 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13850, loss[loss=0.07446, beats_loss=0.01478, ecapa_loss=0.0001498, whisper_loss=0.05818, over 14434.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01108, ecapa_loss=0.0001843, whisper_loss=0.09264, over 3824984.02 frames. ], batch size: 57, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:44:55,801 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 03:44:58,941 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-12 03:45:05,775 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 03:45:32,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1443020.0, ans=0.125 2024-08-12 03:45:37,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1443120.0, ans=0.125 2024-08-12 03:45:38,608 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.591e+01 3.040e+01 3.441e+01 5.923e+01, threshold=6.079e+01, percent-clipped=1.0 2024-08-12 03:45:40,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-12 03:45:40,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1443120.0, ans=6.0 2024-08-12 03:45:51,950 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.47 vs. limit=22.5 2024-08-12 03:46:04,003 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13900, loss[loss=0.09968, beats_loss=0.01046, ecapa_loss=0.0001833, whisper_loss=0.08739, over 18907.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01105, ecapa_loss=0.000184, whisper_loss=0.09281, over 3852135.72 frames. ], batch size: 73, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:46:25,071 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.367e-01 2024-08-12 03:46:33,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1443520.0, ans=0.125 2024-08-12 03:46:37,721 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2024-08-12 03:46:50,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1443620.0, ans=0.05 2024-08-12 03:46:53,581 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 03:46:57,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2024-08-12 03:47:14,606 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 13950, loss[loss=0.09385, beats_loss=0.008058, ecapa_loss=0.0002056, whisper_loss=0.08374, over 16584.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01105, ecapa_loss=0.0001821, whisper_loss=0.09338, over 3868179.11 frames. ], batch size: 63, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:47:23,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1443820.0, ans=0.2 2024-08-12 03:47:48,631 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 03:47:59,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.550e+01 2.827e+01 3.293e+01 5.052e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 03:48:05,925 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 03:48:16,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1444220.0, ans=0.1 2024-08-12 03:48:24,490 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 14000, loss[loss=0.1068, beats_loss=0.012, ecapa_loss=0.0001756, whisper_loss=0.09304, over 22422.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01104, ecapa_loss=0.0001816, whisper_loss=0.09379, over 3867078.45 frames. ], batch size: 91, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:48:26,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1444320.0, ans=0.1 2024-08-12 03:48:26,941 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-12 03:48:40,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1444420.0, ans=0.2 2024-08-12 03:48:41,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1444420.0, ans=0.125 2024-08-12 03:48:51,005 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2024-08-12 03:49:11,783 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 03:49:18,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1444620.0, ans=0.1 2024-08-12 03:49:24,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2024-08-12 03:49:34,761 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 14050, loss[loss=0.1, beats_loss=0.01227, ecapa_loss=0.0002009, whisper_loss=0.08574, over 22249.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01111, ecapa_loss=0.0001829, whisper_loss=0.09301, over 3857799.88 frames. ], batch size: 92, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:49:50,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1444920.0, ans=0.2 2024-08-12 03:49:51,846 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 03:49:53,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1444920.0, ans=0.0 2024-08-12 03:49:55,972 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 03:50:05,917 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 03:50:15,341 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 03:50:19,564 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.615e+01 2.934e+01 3.537e+01 1.110e+02, threshold=5.868e+01, percent-clipped=2.0 2024-08-12 03:50:37,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1445220.0, ans=0.125 2024-08-12 03:50:38,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1445220.0, ans=0.125 2024-08-12 03:50:44,813 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 14100, loss[loss=0.1095, beats_loss=0.008993, ecapa_loss=0.0001929, whisper_loss=0.09858, over 16148.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01105, ecapa_loss=0.000184, whisper_loss=0.09362, over 3855189.91 frames. ], batch size: 64, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:50:46,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1445320.0, ans=0.2 2024-08-12 03:51:00,870 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.33 vs. limit=22.5 2024-08-12 03:51:10,842 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 26 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-12 03:51:23,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1445520.0, ans=0.04949747468305833 2024-08-12 03:51:31,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1445620.0, ans=0.125 2024-08-12 03:51:40,865 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 10 from Vox, 38 fro AS 2024-08-12 03:51:53,399 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 14150, loss[loss=0.1132, beats_loss=0.01056, ecapa_loss=0.0001923, whisper_loss=0.1007, over 22022.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0111, ecapa_loss=0.0001834, whisper_loss=0.09312, over 3836835.82 frames. ], batch size: 91, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:51:54,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1445820.0, ans=0.125 2024-08-12 03:51:54,273 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-08-12 03:52:01,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1445820.0, ans=0.2 2024-08-12 03:52:15,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1445920.0, ans=0.1 2024-08-12 03:52:20,617 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 03:52:20,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1446020.0, ans=0.2 2024-08-12 03:52:30,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1446020.0, ans=0.0 2024-08-12 03:52:36,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.480e+01 2.708e+01 3.118e+01 5.988e+01, threshold=5.416e+01, percent-clipped=1.0 2024-08-12 03:52:42,960 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.79 vs. limit=22.5 2024-08-12 03:52:51,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1446220.0, ans=0.0 2024-08-12 03:53:02,295 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 14200, loss[loss=0.1105, beats_loss=0.01243, ecapa_loss=0.0001934, whisper_loss=0.09613, over 22202.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01117, ecapa_loss=0.0001822, whisper_loss=0.09308, over 3873513.41 frames. ], batch size: 90, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:53:09,903 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-12 03:53:20,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1446420.0, ans=0.125 2024-08-12 03:53:21,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1446420.0, ans=0.1 2024-08-12 03:53:23,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1446420.0, ans=0.125 2024-08-12 03:53:25,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1446420.0, ans=0.0 2024-08-12 03:53:39,709 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 03:53:48,206 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 03:54:12,781 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 14250, loss[loss=0.08946, beats_loss=0.01072, ecapa_loss=0.0002342, whisper_loss=0.07639, over 21417.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01116, ecapa_loss=0.0001816, whisper_loss=0.09365, over 3902419.22 frames. ], batch size: 94, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:54:36,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1446920.0, ans=0.1 2024-08-12 03:54:53,514 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 03:54:58,570 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.831e+01 3.136e+01 3.486e+01 5.154e+01, threshold=6.272e+01, percent-clipped=0.0 2024-08-12 03:55:03,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1447120.0, ans=0.0 2024-08-12 03:55:06,987 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=12.0 2024-08-12 03:55:15,983 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 03:55:23,948 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 14300, loss[loss=0.08302, beats_loss=0.01445, ecapa_loss=0.0001775, whisper_loss=0.06679, over 22201.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01114, ecapa_loss=0.0001808, whisper_loss=0.09325, over 3893227.65 frames. ], batch size: 93, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:55:27,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1447320.0, ans=0.125 2024-08-12 03:55:32,717 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 03:55:38,045 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 03:55:49,702 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2024-08-12 03:56:00,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1447520.0, ans=0.2 2024-08-12 03:56:09,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1447620.0, ans=0.09899494936611666 2024-08-12 03:56:11,539 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-12 03:56:22,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1447720.0, ans=0.125 2024-08-12 03:56:32,034 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 14350, loss[loss=0.08234, beats_loss=0.01272, ecapa_loss=0.000183, whisper_loss=0.06779, over 21691.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01115, ecapa_loss=0.0001805, whisper_loss=0.09262, over 3904233.92 frames. ], batch size: 89, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:56:35,668 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.23 vs. limit=12.0 2024-08-12 03:56:47,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1447920.0, ans=0.0 2024-08-12 03:56:50,165 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 12 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 03:56:51,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1447920.0, ans=0.0 2024-08-12 03:56:53,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1447920.0, ans=0.0 2024-08-12 03:57:17,904 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.654e+01 2.989e+01 3.360e+01 6.544e+01, threshold=5.979e+01, percent-clipped=1.0 2024-08-12 03:57:24,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1448120.0, ans=0.125 2024-08-12 03:57:29,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1448220.0, ans=0.0 2024-08-12 03:57:35,860 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 03:57:43,237 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 14400, loss[loss=0.1013, beats_loss=0.01433, ecapa_loss=0.0001777, whisper_loss=0.08515, over 17883.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01113, ecapa_loss=0.0001824, whisper_loss=0.09305, over 3909350.34 frames. ], batch size: 72, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:58:01,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1448420.0, ans=0.2 2024-08-12 03:58:05,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1448420.0, ans=0.5 2024-08-12 03:58:15,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1448520.0, ans=0.125 2024-08-12 03:58:26,073 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 03:58:30,114 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:58:32,561 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 03:58:42,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1448720.0, ans=0.125 2024-08-12 03:58:45,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1448720.0, ans=0.2 2024-08-12 03:58:45,926 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.45 vs. limit=22.5 2024-08-12 03:58:48,163 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-12 03:58:51,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1448820.0, ans=0.2 2024-08-12 03:58:52,005 INFO [train_multi_KD3.py:1116] (1/4) Epoch 10, batch 14450, loss[loss=0.1116, beats_loss=0.009716, ecapa_loss=0.0002038, whisper_loss=0.09987, over 16630.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01111, ecapa_loss=0.0001826, whisper_loss=0.09281, over 3898209.95 frames. ], batch size: 66, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:59:24,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1449020.0, ans=0.125 2024-08-12 03:59:28,507 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-12 03:59:34,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.542e+01 2.850e+01 3.301e+01 1.207e+02, threshold=5.700e+01, percent-clipped=1.0 2024-08-12 04:00:35,390 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 0, loss[loss=0.07611, beats_loss=0.01167, ecapa_loss=0.0001729, whisper_loss=0.06271, over 14928.00 frames. ], tot_loss[loss=0.07611, beats_loss=0.01167, ecapa_loss=0.0001729, whisper_loss=0.06271, over 14928.00 frames. ], batch size: 59, lr: 5.88e-03, grad_scale: 1.152921504606847e+18 2024-08-12 04:00:35,391 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 04:01:15,795 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on ASR_libri: loss=0.2556, beats_loss=0, ecapa_loss=0.0005978, whisper_loss=0.2496, over 922467.00 frames. 2024-08-12 04:01:28,816 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.1239, 1.5143, 2.0429, 1.5991, 0.9136, 1.9295, 1.8068, 1.0172], device='cuda:1') 2024-08-12 04:01:31,060 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on SV_voxceleb1: loss=0.004953, beats_loss=0, ecapa_loss=0.0004953, whisper_loss=0, over 939242.00 frames. 2024-08-12 04:03:26,136 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8775, 2.1080, 1.6275, 1.4300, 1.5988, 1.5519, 2.0394, 1.8631], device='cuda:1') 2024-08-12 04:03:27,037 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on AT_audioset: loss=0.02449, beats_loss=0.02449, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 04:03:27,040 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 04:03:36,558 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 04:03:43,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1449260.0, ans=0.0 2024-08-12 04:03:46,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1449260.0, ans=0.125 2024-08-12 04:03:48,250 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 04:04:19,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1449460.0, ans=0.1 2024-08-12 04:04:22,143 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.38 vs. limit=15.0 2024-08-12 04:04:24,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1449460.0, ans=0.0 2024-08-12 04:04:35,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1449460.0, ans=0.0 2024-08-12 04:04:35,709 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2024-08-12 04:05:02,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1449560.0, ans=0.125 2024-08-12 04:05:04,477 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 04:05:09,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1449660.0, ans=0.2 2024-08-12 04:05:12,438 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 04:05:12,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1449660.0, ans=0.125 2024-08-12 04:05:24,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1449660.0, ans=0.0 2024-08-12 04:05:33,346 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 50, loss[loss=0.1166, beats_loss=0.008514, ecapa_loss=0.0001958, whisper_loss=0.1061, over 21044.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01065, ecapa_loss=0.0001876, whisper_loss=0.08876, over 860276.52 frames. ], batch size: 82, lr: 5.88e-03, grad_scale: 1.152921504606847e+18 2024-08-12 04:05:39,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1449760.0, ans=0.125 2024-08-12 04:05:52,726 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 04:06:11,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1449860.0, ans=0.2 2024-08-12 04:06:30,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-12 04:06:41,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1449960.0, ans=0.1 2024-08-12 04:07:07,511 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 04:07:07,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1450160.0, ans=0.125 2024-08-12 04:07:08,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.147e+01 2.961e+01 3.212e+01 3.624e+01 5.944e+01, threshold=6.424e+01, percent-clipped=1.0 2024-08-12 04:07:13,190 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 34 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 04:07:30,234 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 100, loss[loss=0.08767, beats_loss=0.01206, ecapa_loss=0.0001707, whisper_loss=0.0739, over 21284.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01043, ecapa_loss=0.0001853, whisper_loss=0.09155, over 1542491.63 frames. ], batch size: 83, lr: 5.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 04:07:39,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1450260.0, ans=0.1 2024-08-12 04:08:32,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1450460.0, ans=0.125 2024-08-12 04:08:39,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1450460.0, ans=0.0 2024-08-12 04:08:54,447 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 04:09:01,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1450560.0, ans=0.0 2024-08-12 04:09:35,003 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 04:09:55,299 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 150, loss[loss=0.09003, beats_loss=0.01367, ecapa_loss=0.000137, whisper_loss=0.07499, over 20576.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01027, ecapa_loss=0.0001845, whisper_loss=0.09252, over 2055980.07 frames. ], batch size: 83, lr: 5.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 04:10:41,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1450860.0, ans=0.125 2024-08-12 04:11:11,832 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 30 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 04:11:11,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1451060.0, ans=0.0 2024-08-12 04:11:16,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1451060.0, ans=0.0 2024-08-12 04:11:20,278 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2024-08-12 04:11:38,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.724e+01 3.107e+01 3.626e+01 6.235e+01, threshold=6.215e+01, percent-clipped=0.0 2024-08-12 04:12:04,620 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 200, loss[loss=0.1259, beats_loss=0.008758, ecapa_loss=0.0001815, whisper_loss=0.1154, over 21880.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0104, ecapa_loss=0.000184, whisper_loss=0.09293, over 2443715.42 frames. ], batch size: 84, lr: 5.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 04:12:09,750 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 04:12:32,375 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-12 04:12:33,053 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 04:12:47,393 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 13 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 04:12:52,682 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-08-12 04:13:03,927 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 04:13:11,718 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2024-08-12 04:13:12,279 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 36 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 04:13:27,372 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 41 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 04:13:38,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1451660.0, ans=0.125 2024-08-12 04:14:04,541 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 250, loss[loss=0.0957, beats_loss=0.01153, ecapa_loss=0.0001588, whisper_loss=0.08258, over 16095.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01052, ecapa_loss=0.0001826, whisper_loss=0.09284, over 2752469.45 frames. ], batch size: 62, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:14:23,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1451760.0, ans=0.1 2024-08-12 04:14:31,597 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-12 04:14:36,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1451860.0, ans=0.95 2024-08-12 04:14:50,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1451960.0, ans=10.0 2024-08-12 04:15:07,601 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2024-08-12 04:15:30,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1452060.0, ans=0.125 2024-08-12 04:15:40,858 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 04:15:41,523 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.465e+01 2.658e+01 3.015e+01 5.855e+01, threshold=5.316e+01, percent-clipped=0.0 2024-08-12 04:15:45,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1452160.0, ans=0.2 2024-08-12 04:15:45,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1452160.0, ans=0.1 2024-08-12 04:15:47,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1452160.0, ans=0.025 2024-08-12 04:15:59,477 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 04:16:03,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 300, loss[loss=0.1084, beats_loss=0.01037, ecapa_loss=0.0001908, whisper_loss=0.0961, over 22470.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0107, ecapa_loss=0.0001835, whisper_loss=0.09256, over 3002320.87 frames. ], batch size: 91, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:16:09,968 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-12 04:16:14,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1452260.0, ans=0.0 2024-08-12 04:16:19,435 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 04:17:14,682 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 350, loss[loss=0.1169, beats_loss=0.009937, ecapa_loss=0.0001851, whisper_loss=0.1051, over 17232.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01089, ecapa_loss=0.0001814, whisper_loss=0.09159, over 3176112.22 frames. ], batch size: 69, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:17:21,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1452760.0, ans=0.125 2024-08-12 04:17:31,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1452860.0, ans=0.2 2024-08-12 04:17:39,464 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-08-12 04:18:02,770 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 04:18:15,712 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.542e+01 2.799e+01 3.205e+01 6.505e+01, threshold=5.597e+01, percent-clipped=2.0 2024-08-12 04:18:20,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1453160.0, ans=0.125 2024-08-12 04:18:28,531 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 400, loss[loss=0.09877, beats_loss=0.01135, ecapa_loss=0.0001971, whisper_loss=0.08545, over 21702.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01092, ecapa_loss=0.0001801, whisper_loss=0.0914, over 3310317.66 frames. ], batch size: 89, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:18:34,026 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=15.0 2024-08-12 04:18:34,417 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 04:18:36,065 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 16 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-12 04:18:58,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1453460.0, ans=0.125 2024-08-12 04:18:58,706 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2024-08-12 04:19:02,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1453460.0, ans=0.2 2024-08-12 04:19:21,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1453560.0, ans=0.1 2024-08-12 04:19:28,122 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 7 from Vox, 28 fro AS 2024-08-12 04:19:36,616 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-12 04:19:40,541 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 450, loss[loss=0.08432, beats_loss=0.01205, ecapa_loss=0.0001915, whisper_loss=0.07035, over 18872.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01092, ecapa_loss=0.0001797, whisper_loss=0.091, over 3401703.68 frames. ], batch size: 77, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:19:52,204 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 04:20:03,942 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 15 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 04:20:21,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1453960.0, ans=0.125 2024-08-12 04:20:22,316 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-12 04:20:41,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.543e+01 2.883e+01 3.316e+01 4.776e+01, threshold=5.767e+01, percent-clipped=0.0 2024-08-12 04:20:43,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1454160.0, ans=0.0 2024-08-12 04:20:50,453 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 04:20:52,136 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 04:20:53,473 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-12 04:20:54,538 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 500, loss[loss=0.1067, beats_loss=0.01197, ecapa_loss=0.0001485, whisper_loss=0.09324, over 19478.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01091, ecapa_loss=0.0001793, whisper_loss=0.09084, over 3494137.76 frames. ], batch size: 74, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:20:59,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1454260.0, ans=0.125 2024-08-12 04:21:03,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1454260.0, ans=0.035 2024-08-12 04:21:12,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1454360.0, ans=0.07 2024-08-12 04:21:21,908 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 04:21:29,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1454460.0, ans=0.125 2024-08-12 04:21:39,543 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=12.0 2024-08-12 04:21:53,103 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 04:22:02,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1454660.0, ans=0.125 2024-08-12 04:22:09,092 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 550, loss[loss=0.1061, beats_loss=0.01073, ecapa_loss=0.0001936, whisper_loss=0.09344, over 22727.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01079, ecapa_loss=0.0001804, whisper_loss=0.09209, over 3587952.61 frames. ], batch size: 88, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:22:16,180 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 04:22:19,906 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2024-08-12 04:22:27,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1454860.0, ans=0.125 2024-08-12 04:22:27,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1454860.0, ans=0.125 2024-08-12 04:22:28,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1454860.0, ans=0.2 2024-08-12 04:22:32,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1454860.0, ans=0.0 2024-08-12 04:22:33,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1454860.0, ans=0.0 2024-08-12 04:22:37,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1454960.0, ans=0.2 2024-08-12 04:22:39,000 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-08-12 04:22:43,892 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 04:22:59,892 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 04:23:08,370 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.603e+01 2.842e+01 3.155e+01 5.740e+01, threshold=5.685e+01, percent-clipped=0.0 2024-08-12 04:23:10,616 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.74 vs. limit=10.0 2024-08-12 04:23:21,964 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 600, loss[loss=0.1081, beats_loss=0.01038, ecapa_loss=0.0002054, whisper_loss=0.09566, over 22811.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01088, ecapa_loss=0.0001795, whisper_loss=0.09158, over 3619879.89 frames. ], batch size: 93, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:23:37,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1455360.0, ans=0.07 2024-08-12 04:23:38,391 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 04:23:40,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1455360.0, ans=0.125 2024-08-12 04:23:45,956 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 04:23:57,354 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 04:24:21,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1455660.0, ans=0.2 2024-08-12 04:24:28,403 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 29 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-12 04:24:31,499 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-12 04:24:35,341 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 650, loss[loss=0.08281, beats_loss=0.01271, ecapa_loss=0.0001456, whisper_loss=0.06864, over 17081.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0109, ecapa_loss=0.0001796, whisper_loss=0.09194, over 3713009.20 frames. ], batch size: 65, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:24:48,219 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.37 vs. limit=5.0 2024-08-12 04:24:57,668 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 04:25:20,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.76 vs. limit=22.5 2024-08-12 04:25:27,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1456060.0, ans=0.125 2024-08-12 04:25:35,464 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.475e+01 2.766e+01 3.282e+01 4.630e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 04:25:37,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1456160.0, ans=0.125 2024-08-12 04:25:48,983 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 700, loss[loss=0.09025, beats_loss=0.01202, ecapa_loss=0.0001498, whisper_loss=0.07674, over 18731.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01087, ecapa_loss=0.0001781, whisper_loss=0.09212, over 3753637.51 frames. ], batch size: 72, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:25:53,170 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2024-08-12 04:25:59,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1456260.0, ans=0.125 2024-08-12 04:26:08,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2024-08-12 04:26:24,600 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 04:26:32,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1456460.0, ans=0.2 2024-08-12 04:26:42,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1456560.0, ans=0.2 2024-08-12 04:26:51,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1456660.0, ans=0.0 2024-08-12 04:27:07,387 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 750, loss[loss=0.1154, beats_loss=0.01127, ecapa_loss=0.0001983, whisper_loss=0.1021, over 18093.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01092, ecapa_loss=0.0001774, whisper_loss=0.09232, over 3787399.02 frames. ], batch size: 75, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:27:09,652 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 04:27:31,496 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 04:27:45,707 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2024-08-12 04:27:54,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1456960.0, ans=0.0 2024-08-12 04:28:02,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1457060.0, ans=0.0 2024-08-12 04:28:02,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2024-08-12 04:28:16,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.543e+01 2.919e+01 3.268e+01 8.785e+01, threshold=5.838e+01, percent-clipped=1.0 2024-08-12 04:28:27,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1457160.0, ans=0.1 2024-08-12 04:28:29,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1457160.0, ans=0.0 2024-08-12 04:28:32,367 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 800, loss[loss=0.0889, beats_loss=0.01137, ecapa_loss=0.0001551, whisper_loss=0.07598, over 14196.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01098, ecapa_loss=0.0001773, whisper_loss=0.09162, over 3819058.03 frames. ], batch size: 53, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:28:38,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1457260.0, ans=0.125 2024-08-12 04:28:39,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1457260.0, ans=0.125 2024-08-12 04:28:45,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1457260.0, ans=0.125 2024-08-12 04:28:56,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1457360.0, ans=0.0 2024-08-12 04:29:01,089 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 16 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 04:29:07,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1457460.0, ans=0.125 2024-08-12 04:29:23,371 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.55 vs. limit=22.5 2024-08-12 04:29:25,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1457560.0, ans=0.125 2024-08-12 04:29:36,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1457660.0, ans=0.2 2024-08-12 04:29:47,967 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=15.0 2024-08-12 04:29:48,245 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=12.0 2024-08-12 04:29:52,598 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 850, loss[loss=0.1221, beats_loss=0.008149, ecapa_loss=0.0002231, whisper_loss=0.1117, over 21394.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01088, ecapa_loss=0.0001779, whisper_loss=0.09236, over 3830998.13 frames. ], batch size: 87, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:30:10,750 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 04:30:36,431 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-12 04:30:39,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1457960.0, ans=0.1 2024-08-12 04:30:41,108 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.47 vs. limit=10.0 2024-08-12 04:30:45,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1458060.0, ans=0.0 2024-08-12 04:30:54,714 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 04:30:57,279 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.636e+01 2.987e+01 3.471e+01 7.869e+01, threshold=5.974e+01, percent-clipped=5.0 2024-08-12 04:30:59,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1458160.0, ans=0.0 2024-08-12 04:31:10,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1458260.0, ans=0.125 2024-08-12 04:31:10,954 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 900, loss[loss=0.09479, beats_loss=0.01134, ecapa_loss=0.0001593, whisper_loss=0.08186, over 23205.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0109, ecapa_loss=0.0001764, whisper_loss=0.09193, over 3822479.43 frames. ], batch size: 93, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:31:21,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1458260.0, ans=0.0 2024-08-12 04:31:30,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1458360.0, ans=0.2 2024-08-12 04:32:07,531 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-12 04:32:16,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1458660.0, ans=0.125 2024-08-12 04:32:25,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1458660.0, ans=0.05 2024-08-12 04:32:29,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1458660.0, ans=0.125 2024-08-12 04:32:34,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 950, loss[loss=0.1015, beats_loss=0.01214, ecapa_loss=0.0002252, whisper_loss=0.08711, over 21607.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01095, ecapa_loss=0.000177, whisper_loss=0.0917, over 3852734.11 frames. ], batch size: 90, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:33:11,691 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-12 04:33:12,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1458960.0, ans=0.0 2024-08-12 04:33:13,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1458960.0, ans=0.0 2024-08-12 04:33:28,215 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 04:33:44,596 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.653e+01 2.939e+01 3.386e+01 4.997e+01, threshold=5.879e+01, percent-clipped=0.0 2024-08-12 04:33:45,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1459160.0, ans=0.125 2024-08-12 04:34:00,936 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1000, loss[loss=0.09543, beats_loss=0.01137, ecapa_loss=0.0001618, whisper_loss=0.08244, over 15070.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01097, ecapa_loss=0.0001774, whisper_loss=0.09172, over 3842629.14 frames. ], batch size: 59, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:34:36,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1459460.0, ans=0.125 2024-08-12 04:34:42,129 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 04:34:53,914 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-12 04:35:12,211 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2024-08-12 04:35:12,978 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-12 04:35:16,688 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 04:35:21,603 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1050, loss[loss=0.1114, beats_loss=0.01242, ecapa_loss=0.0001476, whisper_loss=0.09749, over 21043.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01105, ecapa_loss=0.0001756, whisper_loss=0.09084, over 3824063.33 frames. ], batch size: 80, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:35:46,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1459860.0, ans=0.125 2024-08-12 04:35:56,557 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 04:36:01,525 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 04:36:07,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1459960.0, ans=0.0 2024-08-12 04:36:23,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1460060.0, ans=0.0 2024-08-12 04:36:24,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1460060.0, ans=0.125 2024-08-12 04:36:33,432 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.762e+01 2.974e+01 3.480e+01 4.829e+01, threshold=5.949e+01, percent-clipped=0.0 2024-08-12 04:36:46,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1460160.0, ans=0.2 2024-08-12 04:36:48,810 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1100, loss[loss=0.116, beats_loss=0.01262, ecapa_loss=0.0001086, whisper_loss=0.1023, over 24869.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01108, ecapa_loss=0.000175, whisper_loss=0.09112, over 3841613.01 frames. ], batch size: 91, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:36:51,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=22.5 2024-08-12 04:36:51,664 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 04:36:56,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1460260.0, ans=0.125 2024-08-12 04:36:59,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1460260.0, ans=0.2 2024-08-12 04:37:03,451 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 04:37:19,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1460460.0, ans=0.0 2024-08-12 04:37:26,621 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 04:37:45,858 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 04:37:49,777 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2024-08-12 04:37:54,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1460560.0, ans=0.0 2024-08-12 04:38:10,677 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-12 04:38:12,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1460760.0, ans=0.1 2024-08-12 04:38:13,925 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1150, loss[loss=0.1252, beats_loss=0.007887, ecapa_loss=0.0001505, whisper_loss=0.1158, over 16978.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01104, ecapa_loss=0.0001751, whisper_loss=0.0908, over 3820124.66 frames. ], batch size: 60, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:38:22,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-12 04:38:24,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1460760.0, ans=0.0 2024-08-12 04:38:41,060 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=12.0 2024-08-12 04:39:01,288 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 04:39:19,528 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.588e+01 2.774e+01 3.143e+01 5.777e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-12 04:39:33,689 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1200, loss[loss=0.1115, beats_loss=0.01054, ecapa_loss=0.0001874, whisper_loss=0.09906, over 21954.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01099, ecapa_loss=0.0001751, whisper_loss=0.09139, over 3784510.80 frames. ], batch size: 89, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:39:43,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1461260.0, ans=0.1 2024-08-12 04:39:53,007 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 04:40:09,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1461460.0, ans=0.125 2024-08-12 04:40:14,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1461460.0, ans=0.125 2024-08-12 04:40:40,973 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2024-08-12 04:40:47,829 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-12 04:40:54,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1461660.0, ans=0.05 2024-08-12 04:40:57,463 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1250, loss[loss=0.07474, beats_loss=0.0122, ecapa_loss=0.0001644, whisper_loss=0.0609, over 18652.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01102, ecapa_loss=0.0001754, whisper_loss=0.0911, over 3822148.22 frames. ], batch size: 75, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:41:04,044 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-12 04:41:23,379 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 04:41:32,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.31 vs. limit=15.0 2024-08-12 04:41:37,909 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.29 vs. limit=15.0 2024-08-12 04:41:41,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1461960.0, ans=0.125 2024-08-12 04:41:42,520 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 04:41:51,881 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-12 04:42:03,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1462060.0, ans=0.07 2024-08-12 04:42:08,579 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.564e+01 2.833e+01 3.209e+01 5.019e+01, threshold=5.666e+01, percent-clipped=0.0 2024-08-12 04:42:14,576 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 04:42:18,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1462160.0, ans=0.0 2024-08-12 04:42:24,037 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1300, loss[loss=0.1044, beats_loss=0.01035, ecapa_loss=0.0001944, whisper_loss=0.09213, over 19207.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01099, ecapa_loss=0.000175, whisper_loss=0.09119, over 3819346.17 frames. ], batch size: 77, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:42:43,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1462360.0, ans=0.125 2024-08-12 04:42:46,255 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 04:42:57,243 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.126e-01 2024-08-12 04:43:01,469 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 04:43:19,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.83 vs. limit=5.0 2024-08-12 04:43:32,182 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 04:43:41,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1462660.0, ans=0.0 2024-08-12 04:43:45,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1462760.0, ans=0.1 2024-08-12 04:43:46,686 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1350, loss[loss=0.1034, beats_loss=0.0132, ecapa_loss=0.0001495, whisper_loss=0.08871, over 21729.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01104, ecapa_loss=0.0001739, whisper_loss=0.09135, over 3872026.31 frames. ], batch size: 86, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:44:08,154 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-12 04:44:09,026 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=12.0 2024-08-12 04:44:31,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1462960.0, ans=0.0 2024-08-12 04:44:33,174 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.76 vs. limit=22.5 2024-08-12 04:44:53,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=1463060.0, ans=0.1 2024-08-12 04:44:55,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1463160.0, ans=0.125 2024-08-12 04:44:58,662 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.596e+01 2.848e+01 3.248e+01 6.741e+01, threshold=5.696e+01, percent-clipped=1.0 2024-08-12 04:44:59,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1463160.0, ans=0.125 2024-08-12 04:45:10,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1463260.0, ans=0.07 2024-08-12 04:45:11,616 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1400, loss[loss=0.1257, beats_loss=0.01052, ecapa_loss=0.000191, whisper_loss=0.1133, over 22755.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0111, ecapa_loss=0.0001747, whisper_loss=0.09104, over 3886604.43 frames. ], batch size: 89, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:45:12,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1463260.0, ans=0.125 2024-08-12 04:46:02,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1463560.0, ans=0.0 2024-08-12 04:46:08,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1463560.0, ans=0.125 2024-08-12 04:46:08,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1463560.0, ans=0.125 2024-08-12 04:46:59,690 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1450, loss[loss=0.1237, beats_loss=0.01007, ecapa_loss=0.000182, whisper_loss=0.1118, over 23690.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01104, ecapa_loss=0.0001747, whisper_loss=0.09123, over 3865503.65 frames. ], batch size: 94, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:47:04,991 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 04:47:08,367 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 25 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 04:47:15,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1463860.0, ans=0.025 2024-08-12 04:47:30,568 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-12 04:47:39,929 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 04:47:47,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1464060.0, ans=0.035 2024-08-12 04:47:47,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1464060.0, ans=0.125 2024-08-12 04:48:03,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1464160.0, ans=0.1 2024-08-12 04:48:05,912 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.411e+01 2.800e+01 3.262e+01 9.547e+01, threshold=5.600e+01, percent-clipped=2.0 2024-08-12 04:48:09,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1464160.0, ans=0.1 2024-08-12 04:48:11,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.58 vs. limit=6.0 2024-08-12 04:48:20,868 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1500, loss[loss=0.1017, beats_loss=0.007118, ecapa_loss=0.000169, whisper_loss=0.09287, over 15198.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01096, ecapa_loss=0.0001758, whisper_loss=0.09049, over 3828592.04 frames. ], batch size: 54, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:48:22,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1464260.0, ans=0.1 2024-08-12 04:48:25,847 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 04:48:39,021 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-12 04:48:44,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1464360.0, ans=0.1 2024-08-12 04:48:44,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1464360.0, ans=0.1 2024-08-12 04:48:52,010 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 04:49:05,735 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 33 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 04:49:12,160 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 04:49:38,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1464660.0, ans=0.125 2024-08-12 04:49:38,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1464660.0, ans=0.2 2024-08-12 04:49:39,069 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=12.0 2024-08-12 04:49:40,886 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1550, loss[loss=0.08633, beats_loss=0.01057, ecapa_loss=0.0001269, whisper_loss=0.07449, over 16517.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01098, ecapa_loss=0.000174, whisper_loss=0.09097, over 3829520.36 frames. ], batch size: 61, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:50:08,456 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 04:50:17,807 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 14 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 04:50:19,480 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 04:50:28,483 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 04:50:39,967 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=22.5 2024-08-12 04:50:42,464 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-12 04:50:44,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=1465160.0, ans=0.5 2024-08-12 04:50:45,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.381e+01 2.640e+01 3.042e+01 4.916e+01, threshold=5.281e+01, percent-clipped=0.0 2024-08-12 04:50:59,568 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1600, loss[loss=0.09735, beats_loss=0.0102, ecapa_loss=0.0001646, whisper_loss=0.08551, over 19172.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01095, ecapa_loss=0.0001738, whisper_loss=0.09112, over 3816951.51 frames. ], batch size: 73, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:51:05,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-12 04:51:11,830 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.98 vs. limit=8.0 2024-08-12 04:51:14,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1465360.0, ans=0.0 2024-08-12 04:51:31,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1465460.0, ans=0.125 2024-08-12 04:51:54,499 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 16 from LS+wenet, 35 from Vox, 43 fro AS 2024-08-12 04:52:16,737 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1650, loss[loss=0.09198, beats_loss=0.01191, ecapa_loss=0.0001683, whisper_loss=0.07839, over 18871.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01098, ecapa_loss=0.0001741, whisper_loss=0.09068, over 3835359.66 frames. ], batch size: 77, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:52:17,240 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 26 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-12 04:52:27,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1465760.0, ans=0.1 2024-08-12 04:52:27,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1465760.0, ans=0.07 2024-08-12 04:52:29,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1465760.0, ans=0.125 2024-08-12 04:52:41,514 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 04:52:44,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1465860.0, ans=0.125 2024-08-12 04:53:04,239 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 13 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 04:53:08,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1466060.0, ans=0.125 2024-08-12 04:53:19,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.459e+01 2.653e+01 3.242e+01 4.506e+01, threshold=5.307e+01, percent-clipped=0.0 2024-08-12 04:53:27,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1466160.0, ans=0.2 2024-08-12 04:53:28,719 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 04:53:33,377 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1700, loss[loss=0.1072, beats_loss=0.01201, ecapa_loss=0.0001187, whisper_loss=0.09403, over 23706.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01095, ecapa_loss=0.0001733, whisper_loss=0.09186, over 3862113.24 frames. ], batch size: 90, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:53:35,772 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.13 vs. limit=10.0 2024-08-12 04:53:36,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1466260.0, ans=0.2 2024-08-12 04:53:38,823 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2024-08-12 04:53:43,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2024-08-12 04:53:44,834 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 14 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 04:53:49,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1466360.0, ans=0.125 2024-08-12 04:53:58,514 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 04:54:27,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1466560.0, ans=0.125 2024-08-12 04:54:43,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1466660.0, ans=0.0 2024-08-12 04:54:50,301 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1750, loss[loss=0.09244, beats_loss=0.01075, ecapa_loss=0.0001763, whisper_loss=0.07992, over 14022.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001733, whisper_loss=0.09155, over 3849679.56 frames. ], batch size: 53, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:55:09,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1466860.0, ans=0.1 2024-08-12 04:55:17,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1466860.0, ans=0.125 2024-08-12 04:55:25,366 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.075e-01 2024-08-12 04:55:25,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=1466960.0, ans=12.0 2024-08-12 04:55:31,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1466960.0, ans=0.2 2024-08-12 04:55:33,413 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 04:55:45,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.12 vs. limit=22.5 2024-08-12 04:55:51,198 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 30 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 04:55:53,972 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.424e+01 2.723e+01 3.040e+01 5.517e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-12 04:55:54,164 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 04:56:04,828 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 04:56:07,633 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1800, loss[loss=0.08233, beats_loss=0.01468, ecapa_loss=0.0001413, whisper_loss=0.06624, over 18150.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01098, ecapa_loss=0.0001719, whisper_loss=0.09071, over 3832250.23 frames. ], batch size: 73, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:56:09,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1467260.0, ans=0.125 2024-08-12 04:56:29,346 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 04:56:47,255 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 04:56:54,596 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 04:57:00,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1467560.0, ans=0.09899494936611666 2024-08-12 04:57:24,468 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1850, loss[loss=0.106, beats_loss=0.009487, ecapa_loss=0.0001838, whisper_loss=0.09472, over 18870.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01094, ecapa_loss=0.0001729, whisper_loss=0.09079, over 3820054.76 frames. ], batch size: 74, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:57:28,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1467760.0, ans=0.1 2024-08-12 04:57:37,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1467760.0, ans=0.0 2024-08-12 04:57:43,672 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2024-08-12 04:57:50,020 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.63 vs. limit=12.0 2024-08-12 04:57:54,971 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 04:58:10,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1468060.0, ans=0.0 2024-08-12 04:58:13,234 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 27 from LS+wenet, 13 from Vox, 14 fro AS 2024-08-12 04:58:21,926 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 04:58:27,083 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.532e+01 2.817e+01 3.253e+01 1.073e+02, threshold=5.635e+01, percent-clipped=1.0 2024-08-12 04:58:27,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1468160.0, ans=0.0 2024-08-12 04:58:41,990 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1900, loss[loss=0.09375, beats_loss=0.01084, ecapa_loss=0.0001616, whisper_loss=0.08129, over 14565.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01087, ecapa_loss=0.0001755, whisper_loss=0.09151, over 3819750.65 frames. ], batch size: 57, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:59:15,471 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 04:59:21,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1468460.0, ans=0.125 2024-08-12 04:59:34,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1468560.0, ans=0.125 2024-08-12 04:59:40,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1468560.0, ans=0.0 2024-08-12 04:59:44,930 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 19 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 04:59:46,304 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 04:59:54,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1468660.0, ans=0.1 2024-08-12 04:59:57,332 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-12 04:59:59,157 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 1950, loss[loss=0.09466, beats_loss=0.01281, ecapa_loss=0.0002071, whisper_loss=0.07977, over 21614.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.0001766, whisper_loss=0.09102, over 3804310.66 frames. ], batch size: 89, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:00:04,419 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 05:00:26,562 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=12.0 2024-08-12 05:00:31,153 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-12 05:00:41,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1468960.0, ans=0.2 2024-08-12 05:01:01,820 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.456e+01 2.694e+01 2.989e+01 6.245e+01, threshold=5.388e+01, percent-clipped=1.0 2024-08-12 05:01:14,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1469260.0, ans=0.2 2024-08-12 05:01:15,801 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2000, loss[loss=0.08974, beats_loss=0.01125, ecapa_loss=0.0001776, whisper_loss=0.07671, over 18423.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01086, ecapa_loss=0.0001778, whisper_loss=0.09171, over 3825266.68 frames. ], batch size: 74, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:01:31,819 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 05:01:52,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1469460.0, ans=0.125 2024-08-12 05:01:54,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1469460.0, ans=0.125 2024-08-12 05:02:18,846 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 13 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 05:02:21,948 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 05:02:22,241 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 05:02:34,309 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2050, loss[loss=0.08761, beats_loss=0.01094, ecapa_loss=0.0002133, whisper_loss=0.07454, over 16601.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01102, ecapa_loss=0.0001769, whisper_loss=0.09023, over 3804049.71 frames. ], batch size: 70, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:02:38,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2024-08-12 05:02:41,755 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 05:02:51,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1469860.0, ans=0.1 2024-08-12 05:02:52,686 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 05:03:08,002 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 05:03:13,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1469960.0, ans=0.0 2024-08-12 05:03:17,211 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 05:03:26,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1470060.0, ans=0.125 2024-08-12 05:03:29,372 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 05:03:31,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2024-08-12 05:03:35,581 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.015e+00 2024-08-12 05:03:37,129 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.542e+01 2.738e+01 3.129e+01 4.867e+01, threshold=5.477e+01, percent-clipped=0.0 2024-08-12 05:03:46,468 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 19 from LS+wenet, 25 from Vox, 51 fro AS 2024-08-12 05:03:50,760 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2100, loss[loss=0.09662, beats_loss=0.01159, ecapa_loss=0.0001899, whisper_loss=0.08313, over 21920.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01111, ecapa_loss=0.0001757, whisper_loss=0.08976, over 3790788.85 frames. ], batch size: 91, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:03:50,933 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 05:03:54,553 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.80 vs. limit=10.0 2024-08-12 05:04:03,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1470260.0, ans=0.2 2024-08-12 05:04:03,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1470260.0, ans=15.0 2024-08-12 05:04:21,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1470460.0, ans=0.0 2024-08-12 05:04:37,903 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-12 05:04:44,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1470560.0, ans=0.125 2024-08-12 05:05:07,974 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2150, loss[loss=0.08845, beats_loss=0.01401, ecapa_loss=0.0001687, whisper_loss=0.07275, over 20921.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0112, ecapa_loss=0.0001754, whisper_loss=0.08964, over 3827384.10 frames. ], batch size: 89, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:05:21,363 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.66 vs. limit=10.0 2024-08-12 05:05:23,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1470860.0, ans=0.125 2024-08-12 05:05:55,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1471060.0, ans=0.0 2024-08-12 05:05:55,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1471060.0, ans=0.0 2024-08-12 05:06:02,344 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 05:06:05,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1471060.0, ans=0.125 2024-08-12 05:06:09,868 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.509e+01 2.893e+01 3.375e+01 5.887e+01, threshold=5.785e+01, percent-clipped=2.0 2024-08-12 05:06:10,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1471160.0, ans=0.125 2024-08-12 05:06:13,143 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.241e+05 2024-08-12 05:06:17,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1471160.0, ans=0.125 2024-08-12 05:06:23,043 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2200, loss[loss=0.1112, beats_loss=0.01233, ecapa_loss=0.000201, whisper_loss=0.09686, over 14320.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01119, ecapa_loss=0.0001768, whisper_loss=0.08993, over 3819595.42 frames. ], batch size: 57, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:06:34,278 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=12.0 2024-08-12 05:06:54,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1471460.0, ans=0.1 2024-08-12 05:07:03,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1471460.0, ans=0.0 2024-08-12 05:07:05,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1471460.0, ans=0.0 2024-08-12 05:07:15,365 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 24 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 05:07:28,469 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 12 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 05:07:41,042 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2250, loss[loss=0.0802, beats_loss=0.01372, ecapa_loss=0.0001651, whisper_loss=0.06482, over 21589.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01119, ecapa_loss=0.0001781, whisper_loss=0.09013, over 3824685.94 frames. ], batch size: 88, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:07:44,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1471760.0, ans=0.125 2024-08-12 05:08:00,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1471860.0, ans=0.1 2024-08-12 05:08:20,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1471960.0, ans=0.125 2024-08-12 05:08:25,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1471960.0, ans=0.125 2024-08-12 05:08:54,230 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.613e+01 2.941e+01 3.406e+01 8.387e+01, threshold=5.883e+01, percent-clipped=3.0 2024-08-12 05:09:10,045 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 05:09:11,767 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2300, loss[loss=0.1042, beats_loss=0.01431, ecapa_loss=0.0001702, whisper_loss=0.08816, over 14530.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01124, ecapa_loss=0.0001784, whisper_loss=0.09059, over 3853054.27 frames. ], batch size: 60, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:09:40,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1472360.0, ans=0.125 2024-08-12 05:10:07,897 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 05:10:22,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1472560.0, ans=0.07 2024-08-12 05:10:32,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1472660.0, ans=0.125 2024-08-12 05:10:46,896 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2350, loss[loss=0.09937, beats_loss=0.009698, ecapa_loss=0.0002684, whisper_loss=0.08698, over 16187.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01107, ecapa_loss=0.0001804, whisper_loss=0.09164, over 3839935.55 frames. ], batch size: 70, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:10:59,076 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.93 vs. limit=15.0 2024-08-12 05:11:06,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1472760.0, ans=0.0 2024-08-12 05:11:36,334 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.56 vs. limit=22.5 2024-08-12 05:12:18,889 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.614e+01 3.008e+01 3.445e+01 5.971e+01, threshold=6.017e+01, percent-clipped=1.0 2024-08-12 05:12:27,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1473160.0, ans=0.125 2024-08-12 05:12:31,975 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 05:12:33,209 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 05:12:37,551 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2400, loss[loss=0.104, beats_loss=0.01098, ecapa_loss=0.0002144, whisper_loss=0.0909, over 17704.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01108, ecapa_loss=0.0001798, whisper_loss=0.09157, over 3835628.61 frames. ], batch size: 73, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:12:56,903 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2024-08-12 05:12:59,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1473360.0, ans=0.125 2024-08-12 05:13:10,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1473360.0, ans=0.125 2024-08-12 05:13:23,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1473460.0, ans=0.04949747468305833 2024-08-12 05:13:26,521 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 05:13:32,357 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 31 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 05:13:37,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1473560.0, ans=0.2 2024-08-12 05:14:05,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1473660.0, ans=0.1 2024-08-12 05:14:12,728 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 05:14:13,469 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.76 vs. limit=22.5 2024-08-12 05:14:20,441 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2450, loss[loss=0.1106, beats_loss=0.01064, ecapa_loss=0.0001855, whisper_loss=0.09814, over 17828.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01107, ecapa_loss=0.0001806, whisper_loss=0.091, over 3812199.64 frames. ], batch size: 72, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:14:41,613 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 23 from LS+wenet, 17 from Vox, 14 fro AS 2024-08-12 05:15:02,929 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 05:15:05,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1473960.0, ans=0.2 2024-08-12 05:15:13,250 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 05:15:18,833 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 05:15:32,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1474060.0, ans=0.0 2024-08-12 05:15:38,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.575e+01 2.893e+01 3.388e+01 4.265e+01, threshold=5.785e+01, percent-clipped=0.0 2024-08-12 05:15:51,432 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2500, loss[loss=0.1147, beats_loss=0.008623, ecapa_loss=0.0001778, whisper_loss=0.1043, over 19717.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01099, ecapa_loss=0.000181, whisper_loss=0.09151, over 3847939.17 frames. ], batch size: 77, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:15:55,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1474260.0, ans=0.0 2024-08-12 05:15:59,028 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 05:16:04,845 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-12 05:16:19,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1474460.0, ans=0.1 2024-08-12 05:16:23,403 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 05:16:48,719 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 05:16:54,828 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2550, loss[loss=0.1103, beats_loss=0.009265, ecapa_loss=0.0001881, whisper_loss=0.0992, over 20972.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01091, ecapa_loss=0.0001832, whisper_loss=0.09261, over 3858139.34 frames. ], batch size: 84, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:16:58,572 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 05:17:03,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1474760.0, ans=0.09899494936611666 2024-08-12 05:17:47,851 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 2.613e+01 2.908e+01 3.447e+01 1.061e+02, threshold=5.817e+01, percent-clipped=1.0 2024-08-12 05:17:49,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1475160.0, ans=0.95 2024-08-12 05:17:57,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1475160.0, ans=0.09899494936611666 2024-08-12 05:17:59,229 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2600, loss[loss=0.1115, beats_loss=0.01011, ecapa_loss=0.0001762, whisper_loss=0.09962, over 20702.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01091, ecapa_loss=0.0001828, whisper_loss=0.09263, over 3847037.66 frames. ], batch size: 81, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:18:29,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1475460.0, ans=0.0 2024-08-12 05:18:42,768 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 05:19:03,592 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2650, loss[loss=0.1182, beats_loss=0.008287, ecapa_loss=0.0001941, whisper_loss=0.108, over 22384.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01092, ecapa_loss=0.000183, whisper_loss=0.09226, over 3877442.20 frames. ], batch size: 90, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:19:03,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1475760.0, ans=0.2 2024-08-12 05:19:05,162 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 39 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 05:19:15,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1475860.0, ans=0.95 2024-08-12 05:19:27,059 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2024-08-12 05:19:29,889 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 05:19:36,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1475960.0, ans=0.2 2024-08-12 05:19:44,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1476060.0, ans=0.125 2024-08-12 05:19:45,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1476060.0, ans=0.0 2024-08-12 05:19:47,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1476060.0, ans=0.125 2024-08-12 05:19:55,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1476160.0, ans=0.125 2024-08-12 05:19:56,915 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.506e+01 2.786e+01 3.189e+01 5.235e+01, threshold=5.572e+01, percent-clipped=0.0 2024-08-12 05:20:06,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1476160.0, ans=0.125 2024-08-12 05:20:07,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1476260.0, ans=0.1 2024-08-12 05:20:08,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2700, loss[loss=0.0899, beats_loss=0.01057, ecapa_loss=0.0001895, whisper_loss=0.07743, over 13118.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01094, ecapa_loss=0.0001815, whisper_loss=0.0924, over 3855645.22 frames. ], batch size: 53, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:20:16,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-12 05:20:20,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1476360.0, ans=0.2 2024-08-12 05:20:33,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1476460.0, ans=0.125 2024-08-12 05:20:34,189 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 05:20:37,683 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 37 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-12 05:20:41,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1476460.0, ans=0.1 2024-08-12 05:21:12,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1476760.0, ans=0.1 2024-08-12 05:21:13,057 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2750, loss[loss=0.09076, beats_loss=0.01268, ecapa_loss=0.0001399, whisper_loss=0.07668, over 19279.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01096, ecapa_loss=0.0001814, whisper_loss=0.09228, over 3865841.46 frames. ], batch size: 76, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:21:34,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1476860.0, ans=0.125 2024-08-12 05:21:35,117 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 05:21:37,591 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-12 05:21:44,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1476960.0, ans=0.125 2024-08-12 05:21:53,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1477060.0, ans=0.0 2024-08-12 05:22:03,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=1477160.0, ans=0.2 2024-08-12 05:22:05,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.574e+01 2.886e+01 3.333e+01 4.847e+01, threshold=5.772e+01, percent-clipped=0.0 2024-08-12 05:22:15,965 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 05:22:17,146 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2800, loss[loss=0.08414, beats_loss=0.01051, ecapa_loss=0.0001842, whisper_loss=0.07178, over 14908.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01101, ecapa_loss=0.0001813, whisper_loss=0.09255, over 3861167.48 frames. ], batch size: 60, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:22:25,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1477260.0, ans=0.0 2024-08-12 05:22:31,792 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 05:22:37,160 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 32 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 05:22:40,197 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.63 vs. limit=10.0 2024-08-12 05:22:50,689 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=12.0 2024-08-12 05:22:55,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1477560.0, ans=0.025 2024-08-12 05:23:09,165 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 05:23:11,011 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2024-08-12 05:23:16,880 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.59 vs. limit=22.5 2024-08-12 05:23:24,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1477760.0, ans=0.125 2024-08-12 05:23:25,579 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2850, loss[loss=0.07537, beats_loss=0.0122, ecapa_loss=0.0001735, whisper_loss=0.06143, over 17610.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01105, ecapa_loss=0.000181, whisper_loss=0.09284, over 3857511.15 frames. ], batch size: 73, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:23:29,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1477760.0, ans=0.05 2024-08-12 05:23:38,932 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 05:23:56,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=1477960.0, ans=0.2 2024-08-12 05:24:05,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1477960.0, ans=0.125 2024-08-12 05:24:16,137 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2024-08-12 05:24:30,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1478160.0, ans=0.0 2024-08-12 05:24:30,846 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.606e+01 3.053e+01 3.517e+01 5.532e+01, threshold=6.106e+01, percent-clipped=0.0 2024-08-12 05:24:42,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1478160.0, ans=0.0 2024-08-12 05:24:44,725 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2900, loss[loss=0.1394, beats_loss=0.007646, ecapa_loss=0.0001637, whisper_loss=0.1302, over 22642.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01107, ecapa_loss=0.0001802, whisper_loss=0.09298, over 3883426.27 frames. ], batch size: 87, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:25:19,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1478460.0, ans=0.0 2024-08-12 05:25:26,380 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 05:25:43,599 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 05:25:54,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1478760.0, ans=0.1 2024-08-12 05:25:55,263 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 2950, loss[loss=0.08979, beats_loss=0.01382, ecapa_loss=0.0001703, whisper_loss=0.07427, over 21021.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01112, ecapa_loss=0.0001803, whisper_loss=0.09273, over 3936367.52 frames. ], batch size: 88, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:26:03,225 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-12 05:26:16,375 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.29 vs. limit=22.5 2024-08-12 05:26:25,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1478960.0, ans=0.0 2024-08-12 05:26:28,258 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 05:26:48,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.658e+01 2.945e+01 3.393e+01 5.337e+01, threshold=5.890e+01, percent-clipped=0.0 2024-08-12 05:26:49,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1479160.0, ans=0.0 2024-08-12 05:27:00,061 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3000, loss[loss=0.1005, beats_loss=0.01082, ecapa_loss=0.0001569, whisper_loss=0.08809, over 17682.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01115, ecapa_loss=0.0001795, whisper_loss=0.09287, over 3926818.19 frames. ], batch size: 68, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:27:00,061 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 05:27:41,700 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on ASR_libri: loss=0.2561, beats_loss=0, ecapa_loss=0.0006006, whisper_loss=0.2501, over 922467.00 frames. 2024-08-12 05:27:58,583 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on SV_voxceleb1: loss=0.004832, beats_loss=0, ecapa_loss=0.0004832, whisper_loss=0, over 939242.00 frames. 2024-08-12 05:30:00,016 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on AT_audioset: loss=0.02445, beats_loss=0.02445, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 05:30:00,020 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 05:30:03,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1479260.0, ans=0.125 2024-08-12 05:30:03,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1479260.0, ans=0.0 2024-08-12 05:30:14,237 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 05:30:40,921 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 05:30:49,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1479560.0, ans=0.125 2024-08-12 05:30:58,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1479660.0, ans=0.0 2024-08-12 05:30:59,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1479660.0, ans=0.0 2024-08-12 05:31:04,590 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3050, loss[loss=0.08053, beats_loss=0.01453, ecapa_loss=0.0001363, whisper_loss=0.06464, over 22262.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01109, ecapa_loss=0.0001804, whisper_loss=0.0931, over 3921079.49 frames. ], batch size: 88, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:31:25,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1479860.0, ans=0.125 2024-08-12 05:31:29,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1479960.0, ans=0.5 2024-08-12 05:31:29,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1479960.0, ans=0.125 2024-08-12 05:31:40,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1479960.0, ans=0.125 2024-08-12 05:31:44,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1479960.0, ans=0.0 2024-08-12 05:31:44,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1479960.0, ans=0.2 2024-08-12 05:31:54,712 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 05:32:00,993 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.536e+01 2.925e+01 3.464e+01 9.985e+01, threshold=5.850e+01, percent-clipped=2.0 2024-08-12 05:32:04,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1480160.0, ans=0.0 2024-08-12 05:32:12,245 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3100, loss[loss=0.09254, beats_loss=0.01123, ecapa_loss=0.0001872, whisper_loss=0.07943, over 16882.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0111, ecapa_loss=0.0001809, whisper_loss=0.09296, over 3905743.57 frames. ], batch size: 68, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:32:14,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1480260.0, ans=0.125 2024-08-12 05:32:22,094 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-12 05:32:24,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1480360.0, ans=0.0 2024-08-12 05:32:25,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1480360.0, ans=0.0 2024-08-12 05:32:37,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1480460.0, ans=0.125 2024-08-12 05:32:48,704 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 05:32:57,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1480560.0, ans=0.0 2024-08-12 05:33:03,678 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=12.0 2024-08-12 05:33:14,609 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=22.5 2024-08-12 05:33:17,571 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3150, loss[loss=0.1127, beats_loss=0.01078, ecapa_loss=0.0001432, whisper_loss=0.1005, over 18329.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01107, ecapa_loss=0.0001816, whisper_loss=0.093, over 3882440.61 frames. ], batch size: 71, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:33:21,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1480760.0, ans=0.2 2024-08-12 05:33:43,405 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 05:33:46,234 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.846e-02 2024-08-12 05:33:51,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1480960.0, ans=0.04949747468305833 2024-08-12 05:33:52,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1480960.0, ans=0.1 2024-08-12 05:33:54,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1480960.0, ans=0.2 2024-08-12 05:33:56,750 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-12 05:33:56,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1481060.0, ans=0.1 2024-08-12 05:34:10,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.633e+01 2.990e+01 3.410e+01 4.926e+01, threshold=5.980e+01, percent-clipped=0.0 2024-08-12 05:34:16,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1481160.0, ans=0.1 2024-08-12 05:34:22,399 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3200, loss[loss=0.09236, beats_loss=0.01118, ecapa_loss=0.0001262, whisper_loss=0.07992, over 15417.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01105, ecapa_loss=0.0001804, whisper_loss=0.09287, over 3840217.94 frames. ], batch size: 54, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:34:31,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1481260.0, ans=0.0 2024-08-12 05:34:42,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1481360.0, ans=0.0 2024-08-12 05:34:47,365 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 05:34:59,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1481460.0, ans=0.125 2024-08-12 05:35:16,558 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=12.0 2024-08-12 05:35:27,334 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3250, loss[loss=0.1025, beats_loss=0.0103, ecapa_loss=0.0001887, whisper_loss=0.09032, over 21907.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.011, ecapa_loss=0.0001812, whisper_loss=0.0936, over 3868559.86 frames. ], batch size: 90, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:35:30,331 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 05:35:35,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1481760.0, ans=0.5 2024-08-12 05:36:21,188 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.550e+01 2.874e+01 3.283e+01 4.994e+01, threshold=5.748e+01, percent-clipped=0.0 2024-08-12 05:36:26,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1482160.0, ans=0.125 2024-08-12 05:36:29,429 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 28 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 05:36:33,128 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3300, loss[loss=0.1025, beats_loss=0.00949, ecapa_loss=0.0002072, whisper_loss=0.09095, over 14270.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01102, ecapa_loss=0.0001809, whisper_loss=0.09309, over 3844343.30 frames. ], batch size: 55, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:36:33,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1482260.0, ans=0.125 2024-08-12 05:36:57,862 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 31 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 05:37:19,928 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 05:37:22,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1482560.0, ans=0.04949747468305833 2024-08-12 05:37:25,032 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 05:37:27,619 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 05:37:36,523 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 29 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 05:37:37,754 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3350, loss[loss=0.1241, beats_loss=0.007552, ecapa_loss=0.0001949, whisper_loss=0.1146, over 20071.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01102, ecapa_loss=0.0001799, whisper_loss=0.09333, over 3854110.51 frames. ], batch size: 73, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:38:05,132 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 05:38:30,418 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.536e+01 3.034e+01 3.396e+01 1.773e+02, threshold=6.068e+01, percent-clipped=2.0 2024-08-12 05:38:32,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1483160.0, ans=0.2 2024-08-12 05:38:34,512 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-12 05:38:37,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1483160.0, ans=0.125 2024-08-12 05:38:37,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1483160.0, ans=0.1 2024-08-12 05:38:38,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1483160.0, ans=0.125 2024-08-12 05:38:42,241 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3400, loss[loss=0.1072, beats_loss=0.01088, ecapa_loss=0.0002337, whisper_loss=0.09398, over 19520.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01098, ecapa_loss=0.0001807, whisper_loss=0.09356, over 3893340.90 frames. ], batch size: 82, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:39:03,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1483360.0, ans=0.0 2024-08-12 05:39:14,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1483460.0, ans=0.2 2024-08-12 05:39:18,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1483460.0, ans=0.125 2024-08-12 05:39:19,757 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 05:39:24,138 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 41 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 05:39:42,984 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 05:39:50,000 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3450, loss[loss=0.1249, beats_loss=0.008961, ecapa_loss=0.0001783, whisper_loss=0.1142, over 18054.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01098, ecapa_loss=0.0001812, whisper_loss=0.0935, over 3899965.92 frames. ], batch size: 68, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:39:51,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1483760.0, ans=0.1 2024-08-12 05:40:02,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1483860.0, ans=0.125 2024-08-12 05:40:13,472 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-12 05:40:41,435 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 05:40:46,625 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.618e+01 3.064e+01 3.498e+01 5.812e+01, threshold=6.129e+01, percent-clipped=0.0 2024-08-12 05:40:59,823 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3500, loss[loss=0.1054, beats_loss=0.01252, ecapa_loss=0.0001478, whisper_loss=0.09139, over 16991.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.011, ecapa_loss=0.0001816, whisper_loss=0.09299, over 3863612.54 frames. ], batch size: 67, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:41:11,238 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 05:41:26,963 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 05:41:27,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1484460.0, ans=0.0 2024-08-12 05:41:43,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1484560.0, ans=0.125 2024-08-12 05:41:43,451 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2024-08-12 05:41:46,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=1484560.0, ans=12.0 2024-08-12 05:41:52,895 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 05:42:03,481 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 05:42:10,318 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3550, loss[loss=0.1016, beats_loss=0.01081, ecapa_loss=0.0001931, whisper_loss=0.08881, over 18263.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01103, ecapa_loss=0.0001823, whisper_loss=0.09267, over 3888218.78 frames. ], batch size: 74, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:42:12,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1484760.0, ans=0.0 2024-08-12 05:42:32,742 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=12.0 2024-08-12 05:42:33,541 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 35 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 05:42:39,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1484960.0, ans=0.0 2024-08-12 05:42:41,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1484960.0, ans=0.125 2024-08-12 05:42:42,288 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 31 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 05:42:48,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1484960.0, ans=0.05 2024-08-12 05:42:52,537 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=15.0 2024-08-12 05:42:53,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1485060.0, ans=0.125 2024-08-12 05:42:55,606 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 05:43:00,823 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-12 05:43:04,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1485060.0, ans=10.0 2024-08-12 05:43:08,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1485160.0, ans=0.125 2024-08-12 05:43:09,656 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.668e+01 2.975e+01 3.438e+01 5.088e+01, threshold=5.950e+01, percent-clipped=0.0 2024-08-12 05:43:13,111 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 9 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 05:43:19,923 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 05:43:22,605 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3600, loss[loss=0.1173, beats_loss=0.01021, ecapa_loss=0.0002065, whisper_loss=0.105, over 18440.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01099, ecapa_loss=0.0001829, whisper_loss=0.09314, over 3888772.79 frames. ], batch size: 75, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:43:32,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2024-08-12 05:43:40,634 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 05:43:44,662 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 05:44:05,885 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 05:44:32,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1485760.0, ans=0.2 2024-08-12 05:44:33,714 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3650, loss[loss=0.1169, beats_loss=0.01111, ecapa_loss=0.0001896, whisper_loss=0.1039, over 23371.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01102, ecapa_loss=0.0001833, whisper_loss=0.09269, over 3871715.05 frames. ], batch size: 92, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:44:34,392 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.96 vs. limit=22.5 2024-08-12 05:44:35,515 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 05:44:35,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1485760.0, ans=0.07 2024-08-12 05:44:43,848 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 05:44:50,951 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 05:45:04,424 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 05:45:06,901 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 05:45:14,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1485960.0, ans=0.125 2024-08-12 05:45:32,663 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.529e+01 2.870e+01 3.231e+01 5.224e+01, threshold=5.739e+01, percent-clipped=0.0 2024-08-12 05:45:38,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1486160.0, ans=0.025 2024-08-12 05:45:40,610 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-08-12 05:45:45,719 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3700, loss[loss=0.106, beats_loss=0.01043, ecapa_loss=0.0001842, whisper_loss=0.09375, over 17078.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01101, ecapa_loss=0.0001838, whisper_loss=0.09237, over 3851142.73 frames. ], batch size: 68, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:45:49,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1486260.0, ans=0.125 2024-08-12 05:45:52,378 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 05:45:56,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1486260.0, ans=0.2 2024-08-12 05:46:20,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1486460.0, ans=0.125 2024-08-12 05:46:26,250 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 05:46:26,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1486460.0, ans=0.2 2024-08-12 05:46:31,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1486560.0, ans=0.0 2024-08-12 05:46:39,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1486560.0, ans=0.2 2024-08-12 05:46:46,443 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-08-12 05:46:50,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1486660.0, ans=0.1 2024-08-12 05:46:51,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1486660.0, ans=0.0 2024-08-12 05:46:54,039 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.34 vs. limit=6.0 2024-08-12 05:46:56,479 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 35 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-12 05:46:57,236 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2024-08-12 05:46:57,556 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3750, loss[loss=0.1497, beats_loss=0.007322, ecapa_loss=0.000192, whisper_loss=0.1405, over 18229.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01106, ecapa_loss=0.0001824, whisper_loss=0.09256, over 3847379.87 frames. ], batch size: 69, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:47:20,774 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 05:47:25,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1486960.0, ans=0.1 2024-08-12 05:47:26,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1486960.0, ans=0.2 2024-08-12 05:47:31,662 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 05:47:54,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1487160.0, ans=0.2 2024-08-12 05:47:55,534 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.578e+01 2.846e+01 3.197e+01 4.164e+01, threshold=5.692e+01, percent-clipped=0.0 2024-08-12 05:47:57,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1487160.0, ans=0.025 2024-08-12 05:48:09,236 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3800, loss[loss=0.1174, beats_loss=0.0097, ecapa_loss=0.000197, whisper_loss=0.1058, over 13851.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01109, ecapa_loss=0.0001813, whisper_loss=0.09228, over 3837908.39 frames. ], batch size: 55, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:48:13,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1487260.0, ans=0.1 2024-08-12 05:48:15,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1487260.0, ans=0.125 2024-08-12 05:48:37,446 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 05:49:19,416 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 05:49:20,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=1487660.0, ans=0.1 2024-08-12 05:49:22,743 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3850, loss[loss=0.08141, beats_loss=0.01411, ecapa_loss=0.0001257, whisper_loss=0.06604, over 17695.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01116, ecapa_loss=0.0001812, whisper_loss=0.0914, over 3810149.22 frames. ], batch size: 70, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:50:02,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1487960.0, ans=0.125 2024-08-12 05:50:21,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1488160.0, ans=0.125 2024-08-12 05:50:21,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1488160.0, ans=0.0 2024-08-12 05:50:22,458 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.543e+01 2.911e+01 3.298e+01 4.140e+01, threshold=5.821e+01, percent-clipped=0.0 2024-08-12 05:50:24,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1488160.0, ans=0.1 2024-08-12 05:50:29,568 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=22.5 2024-08-12 05:50:35,762 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3900, loss[loss=0.1062, beats_loss=0.008143, ecapa_loss=0.0002551, whisper_loss=0.09548, over 21395.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01113, ecapa_loss=0.0001811, whisper_loss=0.09163, over 3828400.26 frames. ], batch size: 90, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:50:42,170 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 40 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 05:51:17,533 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-12 05:51:17,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1488460.0, ans=0.2 2024-08-12 05:51:28,900 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 05:51:34,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1488560.0, ans=0.125 2024-08-12 05:51:35,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1488660.0, ans=0.125 2024-08-12 05:51:48,943 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-12 05:51:51,423 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 3950, loss[loss=0.09693, beats_loss=0.01146, ecapa_loss=0.0002111, whisper_loss=0.08336, over 20544.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01113, ecapa_loss=0.0001825, whisper_loss=0.09207, over 3875913.51 frames. ], batch size: 90, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:51:59,395 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 05:52:19,392 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 05:52:21,142 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 05:52:28,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1488960.0, ans=0.125 2024-08-12 05:52:47,479 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 05:52:50,382 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 05:52:53,413 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.639e+01 2.878e+01 3.466e+01 7.368e+01, threshold=5.755e+01, percent-clipped=1.0 2024-08-12 05:52:57,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1489160.0, ans=0.125 2024-08-12 05:53:01,554 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 29 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 05:53:07,599 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4000, loss[loss=0.1011, beats_loss=0.01133, ecapa_loss=0.0001924, whisper_loss=0.08783, over 15978.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0111, ecapa_loss=0.0001818, whisper_loss=0.09207, over 3873437.99 frames. ], batch size: 64, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:53:27,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1489360.0, ans=0.125 2024-08-12 05:53:34,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1489360.0, ans=10.0 2024-08-12 05:53:48,252 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 05:54:09,546 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 05:54:17,516 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 05:54:23,068 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4050, loss[loss=0.1135, beats_loss=0.01055, ecapa_loss=0.0002085, whisper_loss=0.1009, over 18442.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01118, ecapa_loss=0.000182, whisper_loss=0.091, over 3848886.60 frames. ], batch size: 74, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:54:39,350 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.82 vs. limit=10.0 2024-08-12 05:54:56,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1489960.0, ans=0.2 2024-08-12 05:55:00,030 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2024-08-12 05:55:10,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1490060.0, ans=0.0 2024-08-12 05:55:19,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1490060.0, ans=0.125 2024-08-12 05:55:19,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1490060.0, ans=0.0 2024-08-12 05:55:25,106 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.625e+01 2.909e+01 3.364e+01 7.852e+01, threshold=5.817e+01, percent-clipped=2.0 2024-08-12 05:55:25,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1490160.0, ans=0.0 2024-08-12 05:55:39,770 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4100, loss[loss=0.08716, beats_loss=0.01093, ecapa_loss=0.0001739, whisper_loss=0.07449, over 20593.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01108, ecapa_loss=0.0001839, whisper_loss=0.09183, over 3848793.39 frames. ], batch size: 83, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:55:41,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1490260.0, ans=0.0 2024-08-12 05:55:59,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1490360.0, ans=0.125 2024-08-12 05:56:12,273 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 17 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 05:56:23,440 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 19 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 05:56:26,214 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 12 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-12 05:56:39,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1490660.0, ans=0.125 2024-08-12 05:56:40,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1490660.0, ans=0.0 2024-08-12 05:56:42,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1490660.0, ans=0.0 2024-08-12 05:56:56,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4150, loss[loss=0.103, beats_loss=0.01262, ecapa_loss=0.0002105, whisper_loss=0.08832, over 22443.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01112, ecapa_loss=0.0001846, whisper_loss=0.09165, over 3841158.56 frames. ], batch size: 94, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:56:58,943 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 05:57:15,966 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 05:57:39,123 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 05:57:41,959 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 05:57:47,618 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-12 05:57:49,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.06 vs. limit=15.0 2024-08-12 05:58:01,423 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.619e+01 2.867e+01 3.217e+01 5.431e+01, threshold=5.734e+01, percent-clipped=0.0 2024-08-12 05:58:15,588 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4200, loss[loss=0.07761, beats_loss=0.01328, ecapa_loss=0.0001537, whisper_loss=0.06279, over 14471.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01119, ecapa_loss=0.0001829, whisper_loss=0.09078, over 3851697.51 frames. ], batch size: 59, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:58:27,832 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=12.0 2024-08-12 05:58:33,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1491360.0, ans=0.1 2024-08-12 05:58:49,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1491460.0, ans=0.07 2024-08-12 05:59:31,351 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-08-12 05:59:33,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1491760.0, ans=0.125 2024-08-12 05:59:34,806 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4250, loss[loss=0.1344, beats_loss=0.009372, ecapa_loss=0.000162, whisper_loss=0.1234, over 23324.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01112, ecapa_loss=0.0001824, whisper_loss=0.09104, over 3872621.49 frames. ], batch size: 89, lr: 5.80e-03, grad_scale: 1.152921504606847e+18 2024-08-12 05:59:45,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1491760.0, ans=0.0 2024-08-12 06:00:13,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1491960.0, ans=0.125 2024-08-12 06:00:15,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1491960.0, ans=0.5 2024-08-12 06:00:18,697 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 20 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-12 06:00:31,311 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 06:00:36,044 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 06:00:40,479 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.448e+01 2.725e+01 3.062e+01 4.978e+01, threshold=5.450e+01, percent-clipped=0.0 2024-08-12 06:00:41,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1492160.0, ans=0.125 2024-08-12 06:00:52,144 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 06:00:56,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4300, loss[loss=0.1119, beats_loss=0.01118, ecapa_loss=0.0002184, whisper_loss=0.09853, over 22695.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01108, ecapa_loss=0.0001818, whisper_loss=0.09117, over 3873093.81 frames. ], batch size: 93, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:01:17,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1492360.0, ans=0.2 2024-08-12 06:01:28,229 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 06:01:36,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1492460.0, ans=0.2 2024-08-12 06:01:50,823 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 30 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 06:02:16,111 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4350, loss[loss=0.1122, beats_loss=0.01021, ecapa_loss=0.0002014, whisper_loss=0.1, over 22142.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01107, ecapa_loss=0.0001822, whisper_loss=0.09056, over 3870673.52 frames. ], batch size: 90, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:02:26,357 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 06:02:26,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1492760.0, ans=0.2 2024-08-12 06:02:33,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1492860.0, ans=0.125 2024-08-12 06:02:49,280 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:02:49,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1492960.0, ans=0.125 2024-08-12 06:02:52,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1492960.0, ans=0.07 2024-08-12 06:03:05,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1493060.0, ans=0.2 2024-08-12 06:03:05,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1493060.0, ans=0.2 2024-08-12 06:03:08,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1493060.0, ans=0.0 2024-08-12 06:03:08,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1493060.0, ans=0.2 2024-08-12 06:03:25,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.573e+01 2.985e+01 3.568e+01 9.873e+01, threshold=5.969e+01, percent-clipped=3.0 2024-08-12 06:03:27,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1493160.0, ans=0.125 2024-08-12 06:03:29,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1493160.0, ans=0.2 2024-08-12 06:03:40,610 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4400, loss[loss=0.09736, beats_loss=0.01401, ecapa_loss=0.0001487, whisper_loss=0.08186, over 23665.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0111, ecapa_loss=0.000182, whisper_loss=0.09037, over 3862842.30 frames. ], batch size: 95, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:03:43,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1493260.0, ans=0.125 2024-08-12 06:04:11,388 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-12 06:04:13,486 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 06:04:18,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1493460.0, ans=0.1 2024-08-12 06:04:25,856 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 06:04:32,081 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.93 vs. limit=10.0 2024-08-12 06:04:35,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1493560.0, ans=0.5 2024-08-12 06:05:05,058 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4450, loss[loss=0.1081, beats_loss=0.01201, ecapa_loss=0.0001344, whisper_loss=0.0948, over 23318.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01109, ecapa_loss=0.0001808, whisper_loss=0.09084, over 3862320.78 frames. ], batch size: 91, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:05:10,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1493760.0, ans=0.125 2024-08-12 06:05:13,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1493760.0, ans=0.1 2024-08-12 06:05:16,664 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 06:05:30,818 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 22 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 06:05:33,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1493860.0, ans=0.125 2024-08-12 06:05:36,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1493860.0, ans=0.0 2024-08-12 06:06:13,904 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.568e+01 2.752e+01 3.153e+01 4.560e+01, threshold=5.503e+01, percent-clipped=0.0 2024-08-12 06:06:21,024 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 06:06:22,843 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 28 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-12 06:06:29,677 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4500, loss[loss=0.1025, beats_loss=0.0119, ecapa_loss=0.0001561, whisper_loss=0.08904, over 20511.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01109, ecapa_loss=0.0001808, whisper_loss=0.09117, over 3886125.65 frames. ], batch size: 83, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:06:45,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1494360.0, ans=0.0 2024-08-12 06:07:02,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1494460.0, ans=0.1 2024-08-12 06:07:03,981 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 06:07:08,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1494460.0, ans=0.0 2024-08-12 06:07:17,866 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 06:07:28,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1494560.0, ans=0.0 2024-08-12 06:07:50,756 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 06:07:53,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.30 vs. limit=10.0 2024-08-12 06:07:55,679 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4550, loss[loss=0.09401, beats_loss=0.009873, ecapa_loss=0.0001997, whisper_loss=0.08214, over 22154.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01109, ecapa_loss=0.0001822, whisper_loss=0.09111, over 3938762.75 frames. ], batch size: 93, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:08:15,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1494860.0, ans=0.125 2024-08-12 06:08:17,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1494860.0, ans=0.125 2024-08-12 06:09:01,709 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-12 06:09:05,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.505e+01 2.717e+01 3.004e+01 5.094e+01, threshold=5.435e+01, percent-clipped=0.0 2024-08-12 06:09:20,746 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4600, loss[loss=0.09216, beats_loss=0.009022, ecapa_loss=0.0001756, whisper_loss=0.08138, over 15484.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01109, ecapa_loss=0.0001831, whisper_loss=0.09075, over 3900963.25 frames. ], batch size: 61, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:09:43,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1495360.0, ans=0.1 2024-08-12 06:10:02,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1495460.0, ans=0.1 2024-08-12 06:10:02,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1495460.0, ans=0.2 2024-08-12 06:10:13,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1495560.0, ans=0.0 2024-08-12 06:10:16,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1495560.0, ans=0.125 2024-08-12 06:10:17,929 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 14 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 06:10:44,568 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4650, loss[loss=0.1099, beats_loss=0.01126, ecapa_loss=0.00018, whisper_loss=0.09685, over 18690.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01111, ecapa_loss=0.0001824, whisper_loss=0.09119, over 3921401.14 frames. ], batch size: 74, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:10:52,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1495760.0, ans=0.125 2024-08-12 06:11:08,509 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 19 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-12 06:11:35,178 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-08-12 06:11:39,198 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 27 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-12 06:11:54,256 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.541e+01 2.726e+01 3.242e+01 5.233e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-12 06:11:55,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1496160.0, ans=0.0 2024-08-12 06:11:55,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1496160.0, ans=0.1 2024-08-12 06:12:09,143 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4700, loss[loss=0.1004, beats_loss=0.01211, ecapa_loss=0.0001435, whisper_loss=0.0869, over 19011.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01113, ecapa_loss=0.0001795, whisper_loss=0.09096, over 3917555.56 frames. ], batch size: 75, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:12:09,387 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 06:12:26,602 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 06:12:29,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1496360.0, ans=0.0 2024-08-12 06:12:32,866 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 18 from LS+wenet, 27 from Vox, 49 fro AS 2024-08-12 06:12:42,588 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 06:12:48,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1496460.0, ans=0.125 2024-08-12 06:12:59,555 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-12 06:13:06,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1496560.0, ans=0.2 2024-08-12 06:13:11,126 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 06:13:30,826 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4750, loss[loss=0.1217, beats_loss=0.008918, ecapa_loss=0.0002008, whisper_loss=0.1107, over 22102.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01118, ecapa_loss=0.0001798, whisper_loss=0.09045, over 3931482.17 frames. ], batch size: 88, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:13:31,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1496760.0, ans=0.0 2024-08-12 06:13:35,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1496760.0, ans=0.0 2024-08-12 06:13:41,514 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.21 vs. limit=6.0 2024-08-12 06:14:00,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1496860.0, ans=0.125 2024-08-12 06:14:07,162 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 06:14:18,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1497060.0, ans=0.125 2024-08-12 06:14:27,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1497060.0, ans=0.125 2024-08-12 06:14:36,387 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.611e+01 2.918e+01 3.267e+01 6.538e+01, threshold=5.836e+01, percent-clipped=2.0 2024-08-12 06:14:42,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1497160.0, ans=0.125 2024-08-12 06:14:50,059 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-12 06:14:51,077 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4800, loss[loss=0.08883, beats_loss=0.01192, ecapa_loss=0.000178, whisper_loss=0.07513, over 18484.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01116, ecapa_loss=0.0001796, whisper_loss=0.09095, over 3930305.79 frames. ], batch size: 76, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:14:53,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1497260.0, ans=0.2 2024-08-12 06:15:01,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1497260.0, ans=0.125 2024-08-12 06:15:04,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1497260.0, ans=0.0 2024-08-12 06:15:10,164 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.65 vs. limit=22.5 2024-08-12 06:15:11,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1497360.0, ans=0.125 2024-08-12 06:15:18,804 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-12 06:15:56,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1497660.0, ans=0.2 2024-08-12 06:16:10,464 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 06:16:13,547 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4850, loss[loss=0.1178, beats_loss=0.01025, ecapa_loss=0.0001995, whisper_loss=0.1055, over 17550.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01117, ecapa_loss=0.0001778, whisper_loss=0.09159, over 3947267.46 frames. ], batch size: 70, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:16:19,523 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 18 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-12 06:16:38,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1497860.0, ans=0.0 2024-08-12 06:16:47,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1497960.0, ans=0.1 2024-08-12 06:17:19,452 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.592e+01 2.912e+01 3.180e+01 4.291e+01, threshold=5.823e+01, percent-clipped=0.0 2024-08-12 06:17:34,493 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4900, loss[loss=0.08601, beats_loss=0.01126, ecapa_loss=0.0001884, whisper_loss=0.07286, over 19211.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01113, ecapa_loss=0.0001802, whisper_loss=0.09139, over 3925401.73 frames. ], batch size: 77, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:17:43,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1498260.0, ans=0.1 2024-08-12 06:17:57,194 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 28 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 06:17:59,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1498360.0, ans=0.125 2024-08-12 06:18:57,134 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 4950, loss[loss=0.09405, beats_loss=0.008883, ecapa_loss=0.0001819, whisper_loss=0.08335, over 15672.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0111, ecapa_loss=0.0001807, whisper_loss=0.09183, over 3878341.33 frames. ], batch size: 61, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:19:09,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1498760.0, ans=0.125 2024-08-12 06:19:19,530 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 06:19:25,995 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 06:20:04,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.693e+01 3.094e+01 3.524e+01 6.311e+01, threshold=6.188e+01, percent-clipped=2.0 2024-08-12 06:20:17,259 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 37 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 06:20:19,922 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5000, loss[loss=0.1133, beats_loss=0.01299, ecapa_loss=0.0001494, whisper_loss=0.09878, over 24288.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01102, ecapa_loss=0.0001817, whisper_loss=0.0928, over 3857628.13 frames. ], batch size: 94, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:20:21,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1499260.0, ans=0.125 2024-08-12 06:20:23,730 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2024-08-12 06:20:48,026 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.56 vs. limit=10.0 2024-08-12 06:20:56,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1499460.0, ans=0.0 2024-08-12 06:21:13,141 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 11 from Vox, 41 fro AS 2024-08-12 06:21:20,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1499560.0, ans=0.125 2024-08-12 06:21:41,764 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5050, loss[loss=0.1013, beats_loss=0.009452, ecapa_loss=0.0002362, whisper_loss=0.08953, over 14390.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01106, ecapa_loss=0.0001814, whisper_loss=0.09337, over 3897494.10 frames. ], batch size: 56, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:21:42,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1499760.0, ans=0.2 2024-08-12 06:21:44,490 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.037e+00 2024-08-12 06:22:01,723 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 06:22:17,587 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.67 vs. limit=22.5 2024-08-12 06:22:37,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1500060.0, ans=0.1 2024-08-12 06:22:43,541 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 06:22:51,078 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.582e+01 2.858e+01 3.371e+01 2.461e+02, threshold=5.717e+01, percent-clipped=1.0 2024-08-12 06:23:05,950 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5100, loss[loss=0.09668, beats_loss=0.01226, ecapa_loss=0.0001844, whisper_loss=0.08258, over 21556.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01106, ecapa_loss=0.0001824, whisper_loss=0.09323, over 3919550.20 frames. ], batch size: 89, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:23:19,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1500260.0, ans=0.0 2024-08-12 06:23:36,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1500360.0, ans=0.1 2024-08-12 06:23:48,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1500460.0, ans=0.2 2024-08-12 06:23:49,132 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-12 06:23:59,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1500560.0, ans=0.0 2024-08-12 06:24:15,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1500660.0, ans=0.125 2024-08-12 06:24:27,328 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5150, loss[loss=0.1205, beats_loss=0.009262, ecapa_loss=0.0001749, whisper_loss=0.1095, over 22760.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01112, ecapa_loss=0.0001808, whisper_loss=0.09363, over 3945263.28 frames. ], batch size: 87, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:24:36,891 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 06:24:40,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1500760.0, ans=0.125 2024-08-12 06:24:51,600 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 06:26:03,300 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.539e+01 2.805e+01 3.216e+01 1.904e+02, threshold=5.610e+01, percent-clipped=1.0 2024-08-12 06:26:09,033 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 06:26:21,179 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5200, loss[loss=0.1081, beats_loss=0.0116, ecapa_loss=0.0001588, whisper_loss=0.09486, over 21986.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0111, ecapa_loss=0.0001801, whisper_loss=0.09298, over 3915666.96 frames. ], batch size: 89, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:26:40,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1501360.0, ans=0.125 2024-08-12 06:26:42,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1501360.0, ans=0.125 2024-08-12 06:26:44,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1501360.0, ans=0.04949747468305833 2024-08-12 06:26:46,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1501360.0, ans=0.125 2024-08-12 06:27:03,678 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 26 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 06:27:21,721 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 25 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 06:27:23,118 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 31 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 06:27:30,321 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 38 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 06:27:40,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1501660.0, ans=0.1 2024-08-12 06:27:41,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1501660.0, ans=0.0 2024-08-12 06:27:49,679 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5250, loss[loss=0.1133, beats_loss=0.01051, ecapa_loss=0.0002002, whisper_loss=0.1008, over 22708.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01107, ecapa_loss=0.0001816, whisper_loss=0.09343, over 3915351.99 frames. ], batch size: 92, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:27:50,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1501760.0, ans=0.0 2024-08-12 06:27:52,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1501760.0, ans=0.125 2024-08-12 06:28:20,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1501860.0, ans=0.125 2024-08-12 06:28:30,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1501960.0, ans=0.0 2024-08-12 06:28:58,842 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.556e+01 2.811e+01 3.138e+01 9.826e+01, threshold=5.623e+01, percent-clipped=1.0 2024-08-12 06:29:03,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1502160.0, ans=0.0 2024-08-12 06:29:04,124 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-12 06:29:13,734 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5300, loss[loss=0.1209, beats_loss=0.01161, ecapa_loss=0.0001647, whisper_loss=0.1076, over 18473.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01107, ecapa_loss=0.0001818, whisper_loss=0.09308, over 3893939.33 frames. ], batch size: 72, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:30:00,769 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.35 vs. limit=22.5 2024-08-12 06:30:07,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1502560.0, ans=0.05 2024-08-12 06:30:07,541 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-08-12 06:30:31,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1502660.0, ans=0.125 2024-08-12 06:30:32,469 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2024-08-12 06:30:33,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1502660.0, ans=0.0 2024-08-12 06:30:33,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1502660.0, ans=0.0 2024-08-12 06:30:35,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1502760.0, ans=0.125 2024-08-12 06:30:36,260 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5350, loss[loss=0.1129, beats_loss=0.00857, ecapa_loss=0.0002324, whisper_loss=0.102, over 19588.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01103, ecapa_loss=0.0001826, whisper_loss=0.0928, over 3898548.21 frames. ], batch size: 85, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:31:01,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1502860.0, ans=0.125 2024-08-12 06:31:10,922 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 06:31:14,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1502960.0, ans=0.04949747468305833 2024-08-12 06:31:15,786 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 06:31:26,840 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=12.0 2024-08-12 06:31:41,043 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=15.0 2024-08-12 06:31:43,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.466e+01 2.824e+01 3.264e+01 5.204e+01, threshold=5.648e+01, percent-clipped=0.0 2024-08-12 06:31:43,728 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 06:31:50,852 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.00 vs. limit=22.5 2024-08-12 06:31:57,468 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5400, loss[loss=0.1255, beats_loss=0.01091, ecapa_loss=0.0001507, whisper_loss=0.1131, over 23578.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01104, ecapa_loss=0.0001836, whisper_loss=0.09261, over 3913125.17 frames. ], batch size: 91, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:32:01,000 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 06:32:09,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1503260.0, ans=0.0 2024-08-12 06:32:29,438 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 06:32:32,719 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 06:32:36,195 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 06:32:47,349 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 06:32:48,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1503560.0, ans=0.125 2024-08-12 06:32:51,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1503560.0, ans=0.0 2024-08-12 06:32:53,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2024-08-12 06:33:17,796 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5450, loss[loss=0.1099, beats_loss=0.00886, ecapa_loss=0.0002064, whisper_loss=0.099, over 13817.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01101, ecapa_loss=0.0001827, whisper_loss=0.09295, over 3889906.91 frames. ], batch size: 56, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:33:21,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1503760.0, ans=0.1 2024-08-12 06:33:53,020 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2024-08-12 06:34:22,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1504160.0, ans=0.125 2024-08-12 06:34:23,018 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.572e+01 2.892e+01 3.418e+01 4.149e+01, threshold=5.785e+01, percent-clipped=0.0 2024-08-12 06:34:26,638 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 25 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-12 06:34:37,104 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5500, loss[loss=0.09966, beats_loss=0.01061, ecapa_loss=0.0001971, whisper_loss=0.08708, over 14324.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01095, ecapa_loss=0.0001831, whisper_loss=0.09366, over 3887964.63 frames. ], batch size: 59, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:35:09,136 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 12 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 06:35:12,280 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 06:35:15,364 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 06:35:25,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1504560.0, ans=0.0 2024-08-12 06:35:47,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1504660.0, ans=0.09899494936611666 2024-08-12 06:35:55,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1504760.0, ans=0.125 2024-08-12 06:35:56,461 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5550, loss[loss=0.08262, beats_loss=0.009468, ecapa_loss=0.0001629, whisper_loss=0.07152, over 14386.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01095, ecapa_loss=0.0001815, whisper_loss=0.09405, over 3933707.61 frames. ], batch size: 55, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:36:01,805 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 06:36:13,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1504860.0, ans=0.125 2024-08-12 06:36:16,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1504860.0, ans=0.125 2024-08-12 06:36:20,968 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 39 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 06:36:28,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1504960.0, ans=22.5 2024-08-12 06:36:37,908 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2024-08-12 06:36:40,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1504960.0, ans=0.0 2024-08-12 06:36:45,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1505060.0, ans=0.125 2024-08-12 06:36:52,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1505060.0, ans=0.2 2024-08-12 06:36:54,776 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 06:36:56,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1505060.0, ans=0.07 2024-08-12 06:36:56,849 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-12 06:37:00,340 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 19 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-12 06:37:01,298 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.511e+01 2.832e+01 3.131e+01 5.675e+01, threshold=5.663e+01, percent-clipped=0.0 2024-08-12 06:37:07,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1505160.0, ans=0.125 2024-08-12 06:37:16,272 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5600, loss[loss=0.121, beats_loss=0.01341, ecapa_loss=0.0001253, whisper_loss=0.1063, over 21954.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01097, ecapa_loss=0.0001813, whisper_loss=0.09393, over 3948049.64 frames. ], batch size: 83, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:37:18,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1505260.0, ans=0.0 2024-08-12 06:37:19,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1505260.0, ans=0.0 2024-08-12 06:37:25,372 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.06 vs. limit=15.0 2024-08-12 06:37:31,022 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 06:37:46,299 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 06:38:27,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1505660.0, ans=0.125 2024-08-12 06:38:31,516 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 29 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-12 06:38:33,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1505660.0, ans=0.0 2024-08-12 06:38:36,853 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-12 06:38:39,751 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5650, loss[loss=0.1145, beats_loss=0.01004, ecapa_loss=0.0001961, whisper_loss=0.1025, over 19234.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01107, ecapa_loss=0.0001802, whisper_loss=0.0928, over 3960294.67 frames. ], batch size: 78, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:38:43,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1505760.0, ans=0.125 2024-08-12 06:38:43,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1505760.0, ans=0.2 2024-08-12 06:38:57,677 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 06:39:13,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1505960.0, ans=0.2 2024-08-12 06:39:13,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1505960.0, ans=0.0 2024-08-12 06:39:20,372 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 06:39:35,962 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-12 06:39:44,871 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.537e+01 2.761e+01 3.260e+01 5.240e+01, threshold=5.523e+01, percent-clipped=0.0 2024-08-12 06:39:58,302 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-12 06:39:59,383 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5700, loss[loss=0.1026, beats_loss=0.01491, ecapa_loss=0.0001365, whisper_loss=0.08631, over 18757.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0111, ecapa_loss=0.0001801, whisper_loss=0.0929, over 3967337.08 frames. ], batch size: 74, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:40:03,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1506260.0, ans=0.0 2024-08-12 06:40:08,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1506260.0, ans=0.0 2024-08-12 06:40:13,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1506260.0, ans=0.0 2024-08-12 06:40:13,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1506260.0, ans=0.2 2024-08-12 06:40:14,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1506360.0, ans=0.125 2024-08-12 06:40:25,570 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-12 06:40:34,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1506460.0, ans=0.125 2024-08-12 06:40:41,966 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 06:40:42,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1506460.0, ans=0.0 2024-08-12 06:40:51,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1506560.0, ans=0.125 2024-08-12 06:40:57,702 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-12 06:41:20,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1506760.0, ans=0.1 2024-08-12 06:41:21,946 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5750, loss[loss=0.09844, beats_loss=0.009863, ecapa_loss=0.0001788, whisper_loss=0.08679, over 14119.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01109, ecapa_loss=0.0001806, whisper_loss=0.09331, over 3964141.75 frames. ], batch size: 53, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:41:25,997 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 06:41:39,278 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 06:41:42,824 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-12 06:41:44,169 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 06:41:52,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1506960.0, ans=0.125 2024-08-12 06:42:04,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1506960.0, ans=0.125 2024-08-12 06:42:04,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1506960.0, ans=0.125 2024-08-12 06:42:07,631 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 29 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 06:42:27,042 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.547e+01 2.860e+01 3.182e+01 5.592e+01, threshold=5.721e+01, percent-clipped=1.0 2024-08-12 06:42:39,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1507160.0, ans=0.125 2024-08-12 06:42:41,642 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5800, loss[loss=0.111, beats_loss=0.008495, ecapa_loss=0.0001949, whisper_loss=0.1005, over 17852.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01103, ecapa_loss=0.0001812, whisper_loss=0.09314, over 3938508.74 frames. ], batch size: 69, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:42:48,885 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.01 vs. limit=15.0 2024-08-12 06:42:55,911 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:43:05,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1507360.0, ans=0.0 2024-08-12 06:43:18,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1507460.0, ans=0.1 2024-08-12 06:43:27,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1507560.0, ans=0.2 2024-08-12 06:43:28,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1507560.0, ans=0.125 2024-08-12 06:43:55,965 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5850, loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.0002144, whisper_loss=0.08896, over 18289.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01104, ecapa_loss=0.0001815, whisper_loss=0.09265, over 3913881.21 frames. ], batch size: 73, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:43:56,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1507760.0, ans=0.125 2024-08-12 06:44:00,491 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 06:44:03,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1507760.0, ans=0.0 2024-08-12 06:44:12,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.61 vs. limit=15.0 2024-08-12 06:44:17,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1507860.0, ans=0.125 2024-08-12 06:44:20,532 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2024-08-12 06:44:21,237 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 06:44:24,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1507960.0, ans=0.0 2024-08-12 06:44:55,701 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.480e+01 2.777e+01 3.215e+01 5.489e+01, threshold=5.554e+01, percent-clipped=0.0 2024-08-12 06:45:06,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1508260.0, ans=0.09899494936611666 2024-08-12 06:45:06,976 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5900, loss[loss=0.09507, beats_loss=0.01266, ecapa_loss=0.0001614, whisper_loss=0.08079, over 17110.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01108, ecapa_loss=0.0001802, whisper_loss=0.09224, over 3864644.27 frames. ], batch size: 68, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:45:10,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1508260.0, ans=0.0 2024-08-12 06:45:17,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1508260.0, ans=0.2 2024-08-12 06:45:20,583 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=10.08 vs. limit=10.0 2024-08-12 06:45:36,826 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 32 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 06:45:38,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1508460.0, ans=0.95 2024-08-12 06:45:41,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1508460.0, ans=0.2 2024-08-12 06:45:50,574 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 06:45:53,326 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 06:45:59,265 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 06:46:00,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=15.0 2024-08-12 06:46:16,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1508760.0, ans=0.07 2024-08-12 06:46:16,894 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 5950, loss[loss=0.125, beats_loss=0.009525, ecapa_loss=0.0002174, whisper_loss=0.1133, over 21608.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0111, ecapa_loss=0.0001801, whisper_loss=0.09152, over 3858157.89 frames. ], batch size: 88, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:46:26,142 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:46:31,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1508860.0, ans=0.2 2024-08-12 06:46:32,595 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 06:46:34,533 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-08-12 06:46:36,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1508860.0, ans=0.2 2024-08-12 06:46:46,968 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=12.0 2024-08-12 06:47:04,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1509060.0, ans=0.0 2024-08-12 06:47:09,866 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-12 06:47:10,210 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:47:15,165 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.604e+01 2.882e+01 3.325e+01 5.467e+01, threshold=5.764e+01, percent-clipped=0.0 2024-08-12 06:47:17,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1509160.0, ans=0.0 2024-08-12 06:47:25,555 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-12 06:47:26,638 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6000, loss[loss=0.1059, beats_loss=0.009297, ecapa_loss=0.0002283, whisper_loss=0.09427, over 20638.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01116, ecapa_loss=0.000179, whisper_loss=0.09142, over 3847368.06 frames. ], batch size: 87, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:47:26,638 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 06:48:09,291 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.000598, whisper_loss=0.2484, over 922467.00 frames. 2024-08-12 06:48:27,525 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on SV_voxceleb1: loss=0.004893, beats_loss=0, ecapa_loss=0.0004893, whisper_loss=0, over 939242.00 frames. 2024-08-12 06:50:30,740 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on AT_audioset: loss=0.02461, beats_loss=0.02461, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 06:50:30,744 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 06:50:32,599 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.232e-02 2024-08-12 06:50:58,135 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-12 06:51:05,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1509460.0, ans=0.1 2024-08-12 06:51:06,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1509460.0, ans=0.0 2024-08-12 06:51:31,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1509660.0, ans=0.05 2024-08-12 06:51:34,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1509660.0, ans=0.125 2024-08-12 06:51:41,930 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6050, loss[loss=0.07514, beats_loss=0.01308, ecapa_loss=0.0001753, whisper_loss=0.0603, over 14507.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01113, ecapa_loss=0.000179, whisper_loss=0.09148, over 3809643.98 frames. ], batch size: 59, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:51:55,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1509860.0, ans=0.2 2024-08-12 06:52:06,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1509860.0, ans=0.07 2024-08-12 06:52:19,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1509960.0, ans=0.1 2024-08-12 06:52:23,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1510060.0, ans=0.1 2024-08-12 06:52:39,880 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.536e+01 2.768e+01 3.094e+01 4.494e+01, threshold=5.536e+01, percent-clipped=0.0 2024-08-12 06:52:45,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1510160.0, ans=0.125 2024-08-12 06:52:50,882 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6100, loss[loss=0.09641, beats_loss=0.01194, ecapa_loss=0.0001781, whisper_loss=0.08269, over 17023.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0111, ecapa_loss=0.0001797, whisper_loss=0.09172, over 3832647.54 frames. ], batch size: 67, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:52:51,102 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 06:53:11,536 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-12 06:53:11,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1510360.0, ans=0.125 2024-08-12 06:53:34,128 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.282e-01 2024-08-12 06:53:36,305 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 14 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 06:53:45,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1510660.0, ans=0.0 2024-08-12 06:53:49,616 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2024-08-12 06:53:57,632 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-12 06:54:00,098 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6150, loss[loss=0.1003, beats_loss=0.0122, ecapa_loss=0.0001661, whisper_loss=0.08644, over 16951.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01112, ecapa_loss=0.0001796, whisper_loss=0.09145, over 3838444.84 frames. ], batch size: 70, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:54:08,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1510760.0, ans=0.1 2024-08-12 06:54:20,596 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 06:54:23,475 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:54:28,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1510960.0, ans=0.2 2024-08-12 06:54:30,350 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2024-08-12 06:54:42,109 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 40 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 06:54:46,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1511060.0, ans=0.125 2024-08-12 06:54:56,147 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 06:54:57,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.648e+01 2.944e+01 3.398e+01 5.258e+01, threshold=5.887e+01, percent-clipped=0.0 2024-08-12 06:55:03,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1511160.0, ans=0.125 2024-08-12 06:55:07,326 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-12 06:55:08,365 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6200, loss[loss=0.09991, beats_loss=0.008001, ecapa_loss=0.0002591, whisper_loss=0.08932, over 15177.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0111, ecapa_loss=0.0001798, whisper_loss=0.09191, over 3851705.84 frames. ], batch size: 60, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:55:10,067 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 06:55:31,943 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 06:55:33,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1511360.0, ans=0.125 2024-08-12 06:55:40,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1511460.0, ans=0.0 2024-08-12 06:55:41,616 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-12 06:56:00,692 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 06:56:01,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=22.5 2024-08-12 06:56:10,259 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 06:56:17,087 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6250, loss[loss=0.09692, beats_loss=0.0105, ecapa_loss=0.0001675, whisper_loss=0.08475, over 17424.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01112, ecapa_loss=0.0001793, whisper_loss=0.0912, over 3833872.68 frames. ], batch size: 68, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:56:31,692 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 06:57:02,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-12 06:57:16,751 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.481e+01 2.801e+01 3.369e+01 5.530e+01, threshold=5.602e+01, percent-clipped=0.0 2024-08-12 06:57:20,850 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 21 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-12 06:57:28,251 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6300, loss[loss=0.1172, beats_loss=0.01072, ecapa_loss=0.0001849, whisper_loss=0.1046, over 22587.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01112, ecapa_loss=0.0001796, whisper_loss=0.09162, over 3839032.21 frames. ], batch size: 87, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:57:29,552 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 25 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-12 06:57:32,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1512260.0, ans=0.2 2024-08-12 06:57:35,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1512260.0, ans=0.0 2024-08-12 06:57:58,201 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-12 06:58:02,306 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 06:58:07,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1512460.0, ans=0.125 2024-08-12 06:58:11,604 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:58:32,915 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-12 06:58:33,793 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=15.0 2024-08-12 06:58:37,697 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 06:58:40,101 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6350, loss[loss=0.1064, beats_loss=0.01278, ecapa_loss=0.0001713, whisper_loss=0.09194, over 20833.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01112, ecapa_loss=0.0001793, whisper_loss=0.09186, over 3827437.02 frames. ], batch size: 84, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:58:52,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1512760.0, ans=0.0 2024-08-12 06:58:55,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1512860.0, ans=0.1 2024-08-12 06:59:08,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1512960.0, ans=0.0 2024-08-12 06:59:10,418 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 06:59:16,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1512960.0, ans=0.0 2024-08-12 06:59:17,738 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 06:59:22,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1512960.0, ans=0.1 2024-08-12 06:59:26,346 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2024-08-12 06:59:42,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.576e+01 2.857e+01 3.160e+01 6.267e+01, threshold=5.713e+01, percent-clipped=1.0 2024-08-12 06:59:46,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1513160.0, ans=0.1 2024-08-12 06:59:53,764 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6400, loss[loss=0.1178, beats_loss=0.009773, ecapa_loss=0.0001751, whisper_loss=0.1063, over 22077.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01109, ecapa_loss=0.0001782, whisper_loss=0.09238, over 3875732.49 frames. ], batch size: 88, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:00:21,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1513460.0, ans=0.1 2024-08-12 07:00:27,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1513460.0, ans=0.125 2024-08-12 07:00:34,479 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 07:00:56,130 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 07:01:07,213 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6450, loss[loss=0.1345, beats_loss=0.01102, ecapa_loss=0.0001609, whisper_loss=0.1219, over 24212.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01116, ecapa_loss=0.0001789, whisper_loss=0.0913, over 3900874.14 frames. ], batch size: 91, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:01:08,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1513760.0, ans=0.1 2024-08-12 07:01:08,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1513760.0, ans=0.125 2024-08-12 07:01:20,058 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-12 07:01:22,913 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 23 from LS+wenet, 9 from Vox, 33 fro AS 2024-08-12 07:01:28,007 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.609e-03 2024-08-12 07:01:32,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1513860.0, ans=0.125 2024-08-12 07:01:36,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1513860.0, ans=0.0 2024-08-12 07:01:47,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1513960.0, ans=0.125 2024-08-12 07:01:59,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1514060.0, ans=0.125 2024-08-12 07:02:10,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.620e+01 2.898e+01 3.369e+01 4.608e+01, threshold=5.796e+01, percent-clipped=0.0 2024-08-12 07:02:14,509 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:02:21,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1514260.0, ans=0.05 2024-08-12 07:02:22,412 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6500, loss[loss=0.1211, beats_loss=0.009944, ecapa_loss=0.0001837, whisper_loss=0.1093, over 20768.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01116, ecapa_loss=0.0001789, whisper_loss=0.09249, over 3929495.21 frames. ], batch size: 82, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:03:08,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1514560.0, ans=0.1 2024-08-12 07:03:10,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1514560.0, ans=0.0 2024-08-12 07:03:24,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1514660.0, ans=0.125 2024-08-12 07:03:24,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1514660.0, ans=0.07 2024-08-12 07:03:30,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1514660.0, ans=0.1 2024-08-12 07:03:37,747 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6550, loss[loss=0.1327, beats_loss=0.007174, ecapa_loss=0.0001641, whisper_loss=0.1239, over 16636.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01113, ecapa_loss=0.0001795, whisper_loss=0.09236, over 3908775.64 frames. ], batch size: 60, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:04:05,126 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 07:04:14,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1514960.0, ans=0.05 2024-08-12 07:04:18,244 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.56 vs. limit=15.0 2024-08-12 07:04:29,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1515060.0, ans=0.04949747468305833 2024-08-12 07:04:44,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1515160.0, ans=0.125 2024-08-12 07:04:47,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.605e+01 2.821e+01 3.389e+01 5.277e+01, threshold=5.643e+01, percent-clipped=0.0 2024-08-12 07:05:02,226 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6600, loss[loss=0.1079, beats_loss=0.01095, ecapa_loss=0.0001609, whisper_loss=0.09532, over 16374.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01115, ecapa_loss=0.0001812, whisper_loss=0.09184, over 3888599.62 frames. ], batch size: 65, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:05:14,269 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 07:05:28,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1515360.0, ans=0.125 2024-08-12 07:05:30,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1515360.0, ans=0.95 2024-08-12 07:05:32,022 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 07:05:36,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1515460.0, ans=0.0 2024-08-12 07:05:42,692 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 07:05:45,313 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2024-08-12 07:05:52,597 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 07:06:12,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1515660.0, ans=0.125 2024-08-12 07:06:15,522 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2024-08-12 07:06:22,847 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6650, loss[loss=0.1256, beats_loss=0.01212, ecapa_loss=0.0002112, whisper_loss=0.1114, over 14280.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01115, ecapa_loss=0.000181, whisper_loss=0.09196, over 3883710.06 frames. ], batch size: 58, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:06:38,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1515760.0, ans=0.125 2024-08-12 07:07:12,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1515960.0, ans=0.2 2024-08-12 07:07:13,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1515960.0, ans=0.0 2024-08-12 07:07:19,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1515960.0, ans=0.1 2024-08-12 07:07:32,597 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-12 07:07:32,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1516060.0, ans=0.1 2024-08-12 07:07:48,873 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.716e+01 3.038e+01 3.399e+01 5.348e+01, threshold=6.076e+01, percent-clipped=0.0 2024-08-12 07:08:06,463 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6700, loss[loss=0.09164, beats_loss=0.01232, ecapa_loss=0.0001509, whisper_loss=0.07781, over 16027.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01104, ecapa_loss=0.0001813, whisper_loss=0.0925, over 3863745.86 frames. ], batch size: 64, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:08:22,378 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-12 07:08:51,158 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 07:09:03,748 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 38 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 07:09:04,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1516560.0, ans=0.0 2024-08-12 07:09:11,473 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-12 07:09:29,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1516660.0, ans=0.125 2024-08-12 07:09:35,771 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 07:09:43,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1516760.0, ans=0.125 2024-08-12 07:09:43,909 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6750, loss[loss=0.1083, beats_loss=0.01211, ecapa_loss=0.0001773, whisper_loss=0.09443, over 22362.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01105, ecapa_loss=0.0001825, whisper_loss=0.09269, over 3872266.95 frames. ], batch size: 93, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:09:55,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1516760.0, ans=0.125 2024-08-12 07:10:05,832 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 16 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 07:10:20,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1516860.0, ans=0.125 2024-08-12 07:11:07,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.593e+01 2.755e+01 3.178e+01 4.521e+01, threshold=5.509e+01, percent-clipped=0.0 2024-08-12 07:11:07,235 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 07:11:13,393 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 07:11:24,980 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6800, loss[loss=0.1145, beats_loss=0.01081, ecapa_loss=0.0002147, whisper_loss=0.1015, over 21887.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01101, ecapa_loss=0.0001828, whisper_loss=0.09266, over 3878232.17 frames. ], batch size: 91, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:11:52,467 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 07:12:06,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1517460.0, ans=0.0 2024-08-12 07:12:21,913 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 07:12:22,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1517560.0, ans=0.125 2024-08-12 07:12:30,390 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 12 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 07:12:35,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1517660.0, ans=0.0 2024-08-12 07:12:41,878 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6850, loss[loss=0.08935, beats_loss=0.01191, ecapa_loss=0.0001445, whisper_loss=0.07599, over 20568.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01104, ecapa_loss=0.0001824, whisper_loss=0.09181, over 3850785.69 frames. ], batch size: 81, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:12:42,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1517760.0, ans=0.125 2024-08-12 07:12:44,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1517760.0, ans=0.125 2024-08-12 07:12:44,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1517760.0, ans=0.1 2024-08-12 07:12:44,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1517760.0, ans=0.2 2024-08-12 07:12:47,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1517760.0, ans=0.2 2024-08-12 07:13:11,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1517960.0, ans=0.0 2024-08-12 07:13:18,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1517960.0, ans=0.125 2024-08-12 07:13:27,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1518060.0, ans=0.125 2024-08-12 07:13:34,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1518060.0, ans=0.125 2024-08-12 07:13:42,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.608e+01 2.859e+01 3.356e+01 1.905e+02, threshold=5.718e+01, percent-clipped=1.0 2024-08-12 07:13:51,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1518160.0, ans=0.125 2024-08-12 07:13:53,866 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6900, loss[loss=0.1203, beats_loss=0.008384, ecapa_loss=0.0001839, whisper_loss=0.11, over 24022.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01098, ecapa_loss=0.000183, whisper_loss=0.09251, over 3848244.74 frames. ], batch size: 93, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:14:32,050 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 07:14:36,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1518560.0, ans=0.0 2024-08-12 07:14:38,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1518560.0, ans=0.125 2024-08-12 07:14:46,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1518560.0, ans=0.125 2024-08-12 07:14:51,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.34 vs. limit=10.0 2024-08-12 07:14:55,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1518660.0, ans=0.1 2024-08-12 07:15:02,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1518660.0, ans=0.95 2024-08-12 07:15:04,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1518760.0, ans=0.125 2024-08-12 07:15:05,779 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 6950, loss[loss=0.1084, beats_loss=0.009246, ecapa_loss=0.0001955, whisper_loss=0.09724, over 23278.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01102, ecapa_loss=0.0001818, whisper_loss=0.09282, over 3888219.23 frames. ], batch size: 90, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:15:46,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1519060.0, ans=0.125 2024-08-12 07:16:04,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.463e+01 2.859e+01 3.118e+01 2.003e+02, threshold=5.718e+01, percent-clipped=2.0 2024-08-12 07:16:08,384 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 07:16:14,924 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7000, loss[loss=0.08795, beats_loss=0.01107, ecapa_loss=0.0001809, whisper_loss=0.07508, over 20341.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01093, ecapa_loss=0.0001826, whisper_loss=0.09304, over 3879223.22 frames. ], batch size: 82, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:16:18,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1519260.0, ans=0.1 2024-08-12 07:16:45,044 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-12 07:16:47,442 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2024-08-12 07:16:48,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.59 vs. limit=8.0 2024-08-12 07:16:50,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1519460.0, ans=0.125 2024-08-12 07:17:06,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1519560.0, ans=0.125 2024-08-12 07:17:10,344 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 07:17:25,140 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7050, loss[loss=0.1261, beats_loss=0.01006, ecapa_loss=0.0001445, whisper_loss=0.1146, over 24497.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01093, ecapa_loss=0.0001826, whisper_loss=0.09319, over 3896549.49 frames. ], batch size: 91, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:17:44,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1519860.0, ans=6.0 2024-08-12 07:17:48,127 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 07:17:54,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1519960.0, ans=0.0 2024-08-12 07:18:09,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1519960.0, ans=0.1 2024-08-12 07:18:11,111 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-08-12 07:18:28,960 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.499e+01 2.770e+01 3.110e+01 4.662e+01, threshold=5.540e+01, percent-clipped=0.0 2024-08-12 07:18:34,819 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 07:18:36,293 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-12 07:18:38,291 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2024-08-12 07:18:40,182 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7100, loss[loss=0.1033, beats_loss=0.01158, ecapa_loss=0.000149, whisper_loss=0.09024, over 22198.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01095, ecapa_loss=0.0001818, whisper_loss=0.09326, over 3883697.31 frames. ], batch size: 87, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:18:50,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1520260.0, ans=0.025 2024-08-12 07:18:56,533 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2024-08-12 07:18:57,648 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.272e+05 2024-08-12 07:19:17,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1520460.0, ans=0.125 2024-08-12 07:19:21,798 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 07:19:25,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1520560.0, ans=0.125 2024-08-12 07:19:43,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1520660.0, ans=0.0 2024-08-12 07:19:54,528 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7150, loss[loss=0.08898, beats_loss=0.01296, ecapa_loss=0.0001721, whisper_loss=0.0743, over 21008.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01085, ecapa_loss=0.0001827, whisper_loss=0.09399, over 3899451.79 frames. ], batch size: 87, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:20:02,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1520760.0, ans=0.0 2024-08-12 07:20:17,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1520860.0, ans=0.1 2024-08-12 07:20:21,330 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 07:20:27,154 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 07:20:30,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1520960.0, ans=0.5 2024-08-12 07:20:44,853 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:20:49,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2024-08-12 07:20:53,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1521160.0, ans=0.125 2024-08-12 07:20:55,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.570e+01 2.914e+01 3.125e+01 1.770e+02, threshold=5.828e+01, percent-clipped=1.0 2024-08-12 07:20:56,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1521160.0, ans=0.0 2024-08-12 07:21:07,465 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7200, loss[loss=0.1309, beats_loss=0.0079, ecapa_loss=0.0002371, whisper_loss=0.1206, over 16660.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0109, ecapa_loss=0.0001818, whisper_loss=0.09386, over 3893602.90 frames. ], batch size: 68, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:21:15,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1521260.0, ans=0.1 2024-08-12 07:21:19,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1521260.0, ans=0.0 2024-08-12 07:21:31,470 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 07:21:33,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1521360.0, ans=0.125 2024-08-12 07:21:40,694 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 07:21:42,408 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-08-12 07:22:06,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1521660.0, ans=0.0 2024-08-12 07:22:07,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1521660.0, ans=0.025 2024-08-12 07:22:22,208 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7250, loss[loss=0.136, beats_loss=0.007606, ecapa_loss=0.0001835, whisper_loss=0.1266, over 15720.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01088, ecapa_loss=0.0001816, whisper_loss=0.0941, over 3898970.73 frames. ], batch size: 57, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:22:32,060 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 07:22:41,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1521860.0, ans=0.125 2024-08-12 07:22:44,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1521860.0, ans=0.0 2024-08-12 07:22:50,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2024-08-12 07:23:14,632 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 07:23:14,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1522060.0, ans=0.0 2024-08-12 07:23:22,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1522160.0, ans=0.0 2024-08-12 07:23:24,537 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.489e+01 2.804e+01 3.145e+01 4.718e+01, threshold=5.607e+01, percent-clipped=0.0 2024-08-12 07:23:31,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1522160.0, ans=0.125 2024-08-12 07:23:34,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1522160.0, ans=0.1 2024-08-12 07:23:36,428 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7300, loss[loss=0.09757, beats_loss=0.009648, ecapa_loss=0.000181, whisper_loss=0.08611, over 15419.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01087, ecapa_loss=0.0001821, whisper_loss=0.09396, over 3865287.48 frames. ], batch size: 58, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:23:39,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1522260.0, ans=0.035 2024-08-12 07:23:43,580 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2024-08-12 07:23:44,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1522260.0, ans=0.125 2024-08-12 07:23:51,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1522360.0, ans=0.0 2024-08-12 07:24:12,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1522460.0, ans=0.0 2024-08-12 07:24:12,499 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-12 07:24:34,167 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 07:24:35,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1522660.0, ans=0.1 2024-08-12 07:24:47,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1522660.0, ans=0.125 2024-08-12 07:24:49,939 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7350, loss[loss=0.1286, beats_loss=0.01077, ecapa_loss=0.0001844, whisper_loss=0.1159, over 22715.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01089, ecapa_loss=0.0001824, whisper_loss=0.09334, over 3840493.63 frames. ], batch size: 90, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:25:02,115 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 07:25:06,879 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.273e+00 2024-08-12 07:25:11,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1522860.0, ans=0.0 2024-08-12 07:25:20,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1522960.0, ans=0.125 2024-08-12 07:25:28,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1522960.0, ans=0.0 2024-08-12 07:25:39,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1523060.0, ans=0.125 2024-08-12 07:25:40,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1523060.0, ans=0.0 2024-08-12 07:25:42,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1523060.0, ans=0.09899494936611666 2024-08-12 07:25:44,265 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.44 vs. limit=22.5 2024-08-12 07:25:47,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1523060.0, ans=0.125 2024-08-12 07:25:52,388 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.567e+01 3.033e+01 3.476e+01 4.624e+01, threshold=6.066e+01, percent-clipped=0.0 2024-08-12 07:25:54,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1523160.0, ans=0.0 2024-08-12 07:26:01,411 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-12 07:26:03,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1523260.0, ans=0.95 2024-08-12 07:26:03,996 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7400, loss[loss=0.1187, beats_loss=0.0103, ecapa_loss=0.000168, whisper_loss=0.1068, over 17766.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01091, ecapa_loss=0.0001819, whisper_loss=0.09292, over 3844346.97 frames. ], batch size: 68, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:26:05,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.79 vs. limit=10.0 2024-08-12 07:26:07,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1523260.0, ans=0.1 2024-08-12 07:26:10,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1523260.0, ans=0.1 2024-08-12 07:26:10,883 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.07 vs. limit=6.0 2024-08-12 07:26:11,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1523260.0, ans=10.0 2024-08-12 07:26:18,718 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 07:26:23,307 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-12 07:26:32,200 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.728e-01 2024-08-12 07:26:33,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1523460.0, ans=0.0 2024-08-12 07:26:38,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1523460.0, ans=0.125 2024-08-12 07:26:45,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1523460.0, ans=0.1 2024-08-12 07:27:09,832 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-12 07:27:11,748 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.86 vs. limit=15.0 2024-08-12 07:27:15,696 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-12 07:27:17,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1523760.0, ans=0.0 2024-08-12 07:27:17,933 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-12 07:27:18,240 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7450, loss[loss=0.1013, beats_loss=0.00913, ecapa_loss=0.0001666, whisper_loss=0.09048, over 17440.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01096, ecapa_loss=0.0001812, whisper_loss=0.09325, over 3849170.79 frames. ], batch size: 67, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:27:21,474 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 36 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-12 07:27:35,125 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 07:27:42,464 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 34 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 07:27:45,437 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:27:53,450 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2024-08-12 07:28:20,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.662e+01 2.945e+01 3.324e+01 4.940e+01, threshold=5.890e+01, percent-clipped=0.0 2024-08-12 07:28:21,251 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 07:28:29,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1524160.0, ans=0.125 2024-08-12 07:28:31,832 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7500, loss[loss=0.06904, beats_loss=0.01323, ecapa_loss=0.0001567, whisper_loss=0.05425, over 14536.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01097, ecapa_loss=0.0001812, whisper_loss=0.09349, over 3871310.98 frames. ], batch size: 59, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:28:33,480 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 07:28:37,660 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-12 07:28:40,682 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 07:28:59,594 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 07:29:09,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1524460.0, ans=0.0 2024-08-12 07:29:14,849 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 07:29:25,815 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=15.0 2024-08-12 07:29:43,476 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7550, loss[loss=0.1163, beats_loss=0.009835, ecapa_loss=0.0001682, whisper_loss=0.1048, over 24445.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01099, ecapa_loss=0.0001822, whisper_loss=0.09319, over 3863115.14 frames. ], batch size: 94, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:29:44,467 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.04 vs. limit=10.0 2024-08-12 07:29:46,823 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 07:29:56,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1524860.0, ans=0.1 2024-08-12 07:30:09,045 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2024-08-12 07:30:14,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1524960.0, ans=0.125 2024-08-12 07:30:37,418 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-12 07:30:46,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.510e+01 2.746e+01 3.098e+01 2.240e+02, threshold=5.492e+01, percent-clipped=2.0 2024-08-12 07:30:51,390 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-12 07:30:58,405 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7600, loss[loss=0.1113, beats_loss=0.01086, ecapa_loss=0.00018, whisper_loss=0.09868, over 22734.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01103, ecapa_loss=0.0001815, whisper_loss=0.09235, over 3876788.29 frames. ], batch size: 90, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:31:00,156 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 27 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 07:31:00,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1525260.0, ans=0.125 2024-08-12 07:31:05,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1525260.0, ans=0.125 2024-08-12 07:31:12,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1525360.0, ans=0.0 2024-08-12 07:31:13,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1525360.0, ans=0.1 2024-08-12 07:31:27,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1525460.0, ans=0.125 2024-08-12 07:31:55,171 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2024-08-12 07:31:59,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1525660.0, ans=0.125 2024-08-12 07:32:00,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1525660.0, ans=0.125 2024-08-12 07:32:05,363 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=22.5 2024-08-12 07:32:12,202 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7650, loss[loss=0.1162, beats_loss=0.009188, ecapa_loss=0.0001567, whisper_loss=0.1054, over 15716.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01112, ecapa_loss=0.0001807, whisper_loss=0.09203, over 3869543.48 frames. ], batch size: 57, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:32:23,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1525760.0, ans=0.125 2024-08-12 07:32:25,075 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.11 vs. limit=6.0 2024-08-12 07:32:35,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1525860.0, ans=0.125 2024-08-12 07:32:40,061 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 40 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 07:32:40,805 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.23 vs. limit=15.0 2024-08-12 07:32:47,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1525960.0, ans=0.125 2024-08-12 07:32:54,228 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.74 vs. limit=8.0 2024-08-12 07:33:12,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1526160.0, ans=0.125 2024-08-12 07:33:13,026 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.517e+01 2.819e+01 3.143e+01 1.705e+02, threshold=5.638e+01, percent-clipped=1.0 2024-08-12 07:33:20,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1526160.0, ans=10.0 2024-08-12 07:33:25,137 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7700, loss[loss=0.1051, beats_loss=0.00998, ecapa_loss=0.0001704, whisper_loss=0.09344, over 19697.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01113, ecapa_loss=0.0001796, whisper_loss=0.09194, over 3895776.13 frames. ], batch size: 76, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:33:30,028 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 07:34:20,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1526560.0, ans=0.1 2024-08-12 07:34:22,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1526560.0, ans=0.1 2024-08-12 07:34:33,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1526660.0, ans=0.1 2024-08-12 07:34:42,733 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7750, loss[loss=0.08665, beats_loss=0.01304, ecapa_loss=0.0001634, whisper_loss=0.07198, over 21154.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01113, ecapa_loss=0.0001802, whisper_loss=0.09116, over 3900113.39 frames. ], batch size: 88, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:34:47,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1526760.0, ans=0.0 2024-08-12 07:35:27,044 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 16 from LS+wenet, 31 from Vox, 25 fro AS 2024-08-12 07:35:39,450 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 07:35:44,888 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.468e+01 2.726e+01 3.182e+01 4.341e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-12 07:35:56,399 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7800, loss[loss=0.1167, beats_loss=0.009283, ecapa_loss=0.0001868, whisper_loss=0.1055, over 21662.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01115, ecapa_loss=0.0001804, whisper_loss=0.0913, over 3923697.11 frames. ], batch size: 87, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:36:01,803 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.96 vs. limit=15.0 2024-08-12 07:36:03,065 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.25 vs. limit=15.0 2024-08-12 07:36:09,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1527260.0, ans=0.1 2024-08-12 07:36:23,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.27 vs. limit=22.5 2024-08-12 07:36:45,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1527560.0, ans=0.0 2024-08-12 07:36:46,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1527560.0, ans=0.125 2024-08-12 07:36:52,792 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 07:36:54,157 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 07:36:57,270 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 07:36:57,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1527660.0, ans=0.125 2024-08-12 07:37:09,944 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7850, loss[loss=0.08517, beats_loss=0.01329, ecapa_loss=0.0002051, whisper_loss=0.06983, over 13642.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01112, ecapa_loss=0.0001816, whisper_loss=0.0915, over 3931689.73 frames. ], batch size: 57, lr: 5.73e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:37:10,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1527760.0, ans=0.125 2024-08-12 07:37:14,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1527760.0, ans=0.0 2024-08-12 07:37:14,414 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.71 vs. limit=22.5 2024-08-12 07:37:15,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1527760.0, ans=0.125 2024-08-12 07:37:33,328 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 07:38:12,277 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-12 07:38:13,754 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.514e+01 2.915e+01 3.388e+01 6.482e+01, threshold=5.829e+01, percent-clipped=1.0 2024-08-12 07:38:20,801 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 07:38:21,069 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:38:23,185 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2024-08-12 07:38:24,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1528260.0, ans=0.125 2024-08-12 07:38:25,066 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7900, loss[loss=0.1171, beats_loss=0.01014, ecapa_loss=0.0001844, whisper_loss=0.1051, over 21573.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01102, ecapa_loss=0.0001823, whisper_loss=0.09288, over 3941811.74 frames. ], batch size: 86, lr: 5.73e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:38:28,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1528260.0, ans=0.125 2024-08-12 07:38:31,028 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 07:38:45,828 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 07:39:04,629 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 07:39:15,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1528560.0, ans=0.125 2024-08-12 07:39:35,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1528660.0, ans=0.1 2024-08-12 07:39:37,938 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 7950, loss[loss=0.1001, beats_loss=0.01007, ecapa_loss=0.0002262, whisper_loss=0.08777, over 16933.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01105, ecapa_loss=0.000182, whisper_loss=0.09256, over 3921672.46 frames. ], batch size: 71, lr: 5.73e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:40:09,796 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.38 vs. limit=10.0 2024-08-12 07:40:09,894 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2024-08-12 07:40:11,936 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-12 07:40:14,901 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-12 07:40:19,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1528960.0, ans=0.125 2024-08-12 07:40:22,895 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.272e+02 2024-08-12 07:40:29,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1529060.0, ans=0.125 2024-08-12 07:40:36,760 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-12 07:40:39,584 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-12 07:40:40,511 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.548e+01 3.030e+01 3.373e+01 4.598e+01, threshold=6.060e+01, percent-clipped=0.0 2024-08-12 07:40:49,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-08-12 07:40:52,149 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8000, loss[loss=0.1039, beats_loss=0.007983, ecapa_loss=0.0002361, whisper_loss=0.09359, over 22177.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01088, ecapa_loss=0.0001818, whisper_loss=0.09391, over 3961608.67 frames. ], batch size: 92, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:40:56,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1529260.0, ans=0.125 2024-08-12 07:40:57,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1529260.0, ans=0.5 2024-08-12 07:41:03,433 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 07:41:06,583 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.533e-03 2024-08-12 07:41:25,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1529460.0, ans=0.0 2024-08-12 07:41:27,103 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2024-08-12 07:41:40,055 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-12 07:41:51,230 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 07:41:54,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1529660.0, ans=0.0 2024-08-12 07:42:01,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1529660.0, ans=0.125 2024-08-12 07:42:01,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1529660.0, ans=0.0 2024-08-12 07:42:03,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1529660.0, ans=15.0 2024-08-12 07:42:07,798 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8050, loss[loss=0.0966, beats_loss=0.01033, ecapa_loss=0.0002087, whisper_loss=0.08418, over 18795.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01084, ecapa_loss=0.0001825, whisper_loss=0.09364, over 3906038.98 frames. ], batch size: 79, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:42:12,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1529760.0, ans=0.125 2024-08-12 07:42:35,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1529960.0, ans=0.125 2024-08-12 07:42:42,725 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-12 07:43:03,597 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 07:43:08,884 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.431e+01 2.661e+01 3.080e+01 6.684e+01, threshold=5.323e+01, percent-clipped=1.0 2024-08-12 07:43:10,680 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 07:43:14,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.04 vs. limit=5.0 2024-08-12 07:43:21,029 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8100, loss[loss=0.09646, beats_loss=0.01386, ecapa_loss=0.0001432, whisper_loss=0.08118, over 16541.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01088, ecapa_loss=0.0001826, whisper_loss=0.09288, over 3886062.06 frames. ], batch size: 65, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:43:25,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1530260.0, ans=0.125 2024-08-12 07:43:28,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1530260.0, ans=0.1 2024-08-12 07:43:30,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1530260.0, ans=0.1 2024-08-12 07:43:31,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1530260.0, ans=15.0 2024-08-12 07:43:51,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=1530460.0, ans=0.1 2024-08-12 07:43:53,539 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-12 07:44:02,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1530460.0, ans=0.125 2024-08-12 07:44:07,463 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2024-08-12 07:44:09,725 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-12 07:44:21,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=12.0 2024-08-12 07:44:22,106 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 07:44:25,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1530660.0, ans=0.0 2024-08-12 07:44:37,200 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8150, loss[loss=0.1191, beats_loss=0.01048, ecapa_loss=0.0001896, whisper_loss=0.1067, over 21500.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01092, ecapa_loss=0.0001818, whisper_loss=0.0928, over 3904120.42 frames. ], batch size: 88, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:44:44,749 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 07:44:44,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1530760.0, ans=0.0 2024-08-12 07:44:50,926 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-12 07:44:51,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1530860.0, ans=0.09899494936611666 2024-08-12 07:44:52,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1530860.0, ans=0.0 2024-08-12 07:44:55,064 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 07:45:02,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1530860.0, ans=0.0 2024-08-12 07:45:14,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1530960.0, ans=0.125 2024-08-12 07:45:38,346 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.559e+01 2.858e+01 3.192e+01 6.698e+01, threshold=5.715e+01, percent-clipped=1.0 2024-08-12 07:45:43,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1531160.0, ans=0.1 2024-08-12 07:45:50,516 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8200, loss[loss=0.1025, beats_loss=0.008858, ecapa_loss=0.000193, whisper_loss=0.09169, over 22022.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01092, ecapa_loss=0.0001808, whisper_loss=0.09255, over 3933511.02 frames. ], batch size: 91, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:46:09,163 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 07:46:15,337 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.33 vs. limit=10.0 2024-08-12 07:46:26,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1531460.0, ans=0.125 2024-08-12 07:46:27,499 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 07:46:51,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1531660.0, ans=0.125 2024-08-12 07:46:51,979 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.019e+00 2024-08-12 07:46:58,920 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-12 07:47:01,858 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8250, loss[loss=0.1041, beats_loss=0.01002, ecapa_loss=0.0002161, whisper_loss=0.09195, over 21413.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01096, ecapa_loss=0.0001798, whisper_loss=0.09288, over 3946204.27 frames. ], batch size: 90, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:47:02,078 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 07:47:04,795 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-12 07:47:29,966 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-12 07:47:40,046 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-12 07:48:03,981 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.593e+01 2.850e+01 3.386e+01 5.334e+01, threshold=5.700e+01, percent-clipped=0.0 2024-08-12 07:48:14,113 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8300, loss[loss=0.08712, beats_loss=0.01425, ecapa_loss=0.0001285, whisper_loss=0.07158, over 19742.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01105, ecapa_loss=0.0001785, whisper_loss=0.09247, over 3942916.05 frames. ], batch size: 78, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:48:44,469 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-12 07:48:53,260 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-12 07:48:53,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1532460.0, ans=0.1 2024-08-12 07:49:04,161 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 07:49:19,742 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=15.0 2024-08-12 07:49:23,103 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8350, loss[loss=0.1011, beats_loss=0.01234, ecapa_loss=0.0001607, whisper_loss=0.08717, over 19140.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01095, ecapa_loss=0.0001823, whisper_loss=0.09299, over 3945433.43 frames. ], batch size: 73, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:49:24,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1532760.0, ans=0.2 2024-08-12 07:49:27,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1532760.0, ans=0.0 2024-08-12 07:49:28,551 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 07:49:33,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1532760.0, ans=0.1 2024-08-12 07:49:46,576 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-08-12 07:49:55,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=22.5 2024-08-12 07:50:18,522 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 07:50:23,753 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.502e+01 2.920e+01 3.300e+01 7.763e+01, threshold=5.841e+01, percent-clipped=2.0 2024-08-12 07:50:33,514 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8400, loss[loss=0.1108, beats_loss=0.01005, ecapa_loss=0.0001914, whisper_loss=0.09888, over 14989.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01089, ecapa_loss=0.0001821, whisper_loss=0.09327, over 3953349.93 frames. ], batch size: 61, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:50:42,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1533260.0, ans=0.125 2024-08-12 07:50:55,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1533360.0, ans=0.2 2024-08-12 07:51:00,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1533360.0, ans=0.1 2024-08-12 07:51:01,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1533460.0, ans=0.2 2024-08-12 07:51:11,283 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 07:51:20,970 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 07:51:22,514 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-12 07:51:29,680 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 07:51:45,282 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8450, loss[loss=0.08081, beats_loss=0.01097, ecapa_loss=0.0001939, whisper_loss=0.0679, over 15953.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01083, ecapa_loss=0.0001821, whisper_loss=0.09304, over 3935189.41 frames. ], batch size: 66, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:51:55,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1533760.0, ans=0.125 2024-08-12 07:52:14,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.86 vs. limit=15.0 2024-08-12 07:52:18,258 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 07:52:18,937 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.97 vs. limit=22.5 2024-08-12 07:52:27,513 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:52:27,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1534060.0, ans=0.05 2024-08-12 07:52:32,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1534060.0, ans=0.0 2024-08-12 07:52:37,103 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 07:52:39,979 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 19 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-12 07:52:46,732 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.460e+01 2.717e+01 3.180e+01 4.918e+01, threshold=5.434e+01, percent-clipped=0.0 2024-08-12 07:52:56,115 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8500, loss[loss=0.1081, beats_loss=0.01139, ecapa_loss=0.0001882, whisper_loss=0.09481, over 22176.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01089, ecapa_loss=0.0001808, whisper_loss=0.09245, over 3925774.40 frames. ], batch size: 90, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:52:58,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1534260.0, ans=0.0 2024-08-12 07:53:02,240 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 07:53:07,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1534260.0, ans=0.0 2024-08-12 07:53:09,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1534360.0, ans=0.125 2024-08-12 07:53:15,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1534360.0, ans=0.2 2024-08-12 07:53:22,407 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 07:53:24,329 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2024-08-12 07:53:34,776 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-12 07:53:54,627 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.36 vs. limit=10.0 2024-08-12 07:54:00,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1534660.0, ans=0.125 2024-08-12 07:54:07,505 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8550, loss[loss=0.101, beats_loss=0.009712, ecapa_loss=0.0002051, whisper_loss=0.08924, over 15159.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01093, ecapa_loss=0.0001813, whisper_loss=0.09236, over 3909764.35 frames. ], batch size: 61, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:54:09,188 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-12 07:54:12,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1534760.0, ans=0.125 2024-08-12 07:54:30,605 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 26 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 07:54:32,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1534860.0, ans=0.0 2024-08-12 07:54:35,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1534960.0, ans=0.125 2024-08-12 07:54:35,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1534960.0, ans=0.125 2024-08-12 07:54:35,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1534960.0, ans=0.125 2024-08-12 07:54:41,177 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=22.5 2024-08-12 07:54:43,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1534960.0, ans=0.125 2024-08-12 07:54:48,039 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 07:54:59,716 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 07:55:09,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.569e+01 2.940e+01 3.392e+01 6.119e+01, threshold=5.880e+01, percent-clipped=2.0 2024-08-12 07:55:11,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1535160.0, ans=0.0 2024-08-12 07:55:13,945 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 19 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-12 07:55:15,242 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 07:55:15,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1535160.0, ans=0.04949747468305833 2024-08-12 07:55:17,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1535160.0, ans=0.0 2024-08-12 07:55:19,405 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8600, loss[loss=0.1071, beats_loss=0.01102, ecapa_loss=0.0001792, whisper_loss=0.09428, over 15477.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01094, ecapa_loss=0.0001813, whisper_loss=0.09214, over 3913908.23 frames. ], batch size: 62, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:55:28,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1535260.0, ans=0.1 2024-08-12 07:55:32,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1535360.0, ans=0.2 2024-08-12 07:55:48,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1535460.0, ans=0.125 2024-08-12 07:56:04,216 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.32 vs. limit=22.5 2024-08-12 07:56:07,266 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.01 vs. limit=15.0 2024-08-12 07:56:09,508 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 07:56:09,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1535560.0, ans=0.0 2024-08-12 07:56:13,632 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 07:56:31,651 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8650, loss[loss=0.1213, beats_loss=0.01027, ecapa_loss=0.00017, whisper_loss=0.1093, over 22423.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01096, ecapa_loss=0.0001814, whisper_loss=0.09207, over 3878540.55 frames. ], batch size: 88, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:56:36,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1535760.0, ans=0.125 2024-08-12 07:56:42,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1535760.0, ans=0.125 2024-08-12 07:57:02,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1535960.0, ans=0.2 2024-08-12 07:57:09,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1535960.0, ans=0.125 2024-08-12 07:57:16,109 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-12 07:57:25,572 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2024-08-12 07:57:32,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1536160.0, ans=0.125 2024-08-12 07:57:34,585 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.601e+01 2.885e+01 3.263e+01 5.509e+01, threshold=5.770e+01, percent-clipped=0.0 2024-08-12 07:57:38,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1536160.0, ans=0.125 2024-08-12 07:57:39,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1536160.0, ans=0.125 2024-08-12 07:57:45,121 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8700, loss[loss=0.1208, beats_loss=0.01115, ecapa_loss=0.0001516, whisper_loss=0.1081, over 22866.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01101, ecapa_loss=0.0001807, whisper_loss=0.09127, over 3875066.59 frames. ], batch size: 90, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:57:55,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1536260.0, ans=0.0 2024-08-12 07:57:56,964 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 07:58:17,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1536460.0, ans=0.2 2024-08-12 07:58:18,883 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 18 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-12 07:58:36,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1536560.0, ans=0.125 2024-08-12 07:58:42,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1536660.0, ans=0.05 2024-08-12 07:58:48,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1536660.0, ans=0.0 2024-08-12 07:58:57,712 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8750, loss[loss=0.1056, beats_loss=0.01094, ecapa_loss=0.0001794, whisper_loss=0.09284, over 21438.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01099, ecapa_loss=0.000182, whisper_loss=0.09146, over 3886550.62 frames. ], batch size: 88, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:59:05,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1536760.0, ans=0.1 2024-08-12 07:59:33,901 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 07:59:35,106 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 07:59:45,302 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-12 07:59:50,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1537060.0, ans=0.125 2024-08-12 07:59:54,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1537160.0, ans=0.1 2024-08-12 07:59:58,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1537160.0, ans=0.1 2024-08-12 07:59:59,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.459e+01 2.752e+01 3.200e+01 4.704e+01, threshold=5.505e+01, percent-clipped=0.0 2024-08-12 08:00:08,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=1537260.0, ans=0.1 2024-08-12 08:00:09,642 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8800, loss[loss=0.1235, beats_loss=0.009899, ecapa_loss=0.0002295, whisper_loss=0.1114, over 20956.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01099, ecapa_loss=0.0001824, whisper_loss=0.09275, over 3900330.12 frames. ], batch size: 87, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:00:19,759 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 08:00:21,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1537260.0, ans=0.125 2024-08-12 08:00:23,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1537360.0, ans=0.0 2024-08-12 08:00:27,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1537360.0, ans=0.09899494936611666 2024-08-12 08:00:29,303 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=12.0 2024-08-12 08:00:31,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1537360.0, ans=0.125 2024-08-12 08:00:51,407 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 08:00:59,523 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2024-08-12 08:01:17,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1537660.0, ans=0.2 2024-08-12 08:01:22,447 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8850, loss[loss=0.09854, beats_loss=0.01162, ecapa_loss=0.0001723, whisper_loss=0.08519, over 19186.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01104, ecapa_loss=0.000182, whisper_loss=0.09193, over 3871720.07 frames. ], batch size: 77, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:01:40,652 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 08:01:42,021 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 08:01:43,455 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-12 08:01:53,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1537960.0, ans=0.1 2024-08-12 08:01:55,065 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 08:02:04,077 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 19 from Vox, 53 fro AS 2024-08-12 08:02:19,290 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 08:02:24,249 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 08:02:25,455 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 08:02:26,643 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.513e+01 2.817e+01 3.159e+01 3.465e+02, threshold=5.633e+01, percent-clipped=4.0 2024-08-12 08:02:28,522 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 08:02:36,969 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8900, loss[loss=0.1119, beats_loss=0.008945, ecapa_loss=0.0001983, whisper_loss=0.101, over 18783.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01111, ecapa_loss=0.0001819, whisper_loss=0.09116, over 3848356.86 frames. ], batch size: 75, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:02:40,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1538260.0, ans=0.1 2024-08-12 08:02:45,557 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 37 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 08:03:04,725 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 08:03:50,469 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 8950, loss[loss=0.09797, beats_loss=0.01223, ecapa_loss=0.0001592, whisper_loss=0.08414, over 16945.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01111, ecapa_loss=0.000181, whisper_loss=0.0912, over 3852507.36 frames. ], batch size: 66, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:03:50,756 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-12 08:04:05,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1538860.0, ans=0.1 2024-08-12 08:04:32,539 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 08:04:34,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1539060.0, ans=0.0 2024-08-12 08:04:34,583 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.43 vs. limit=15.0 2024-08-12 08:04:41,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1539060.0, ans=0.125 2024-08-12 08:04:52,643 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.556e+01 2.825e+01 3.281e+01 7.768e+01, threshold=5.651e+01, percent-clipped=1.0 2024-08-12 08:05:02,460 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9000, loss[loss=0.1289, beats_loss=0.008525, ecapa_loss=0.0001818, whisper_loss=0.1185, over 17192.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01106, ecapa_loss=0.0001807, whisper_loss=0.0921, over 3885501.80 frames. ], batch size: 64, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:05:02,460 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 08:05:41,696 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on ASR_libri: loss=0.2556, beats_loss=0, ecapa_loss=0.0006109, whisper_loss=0.2495, over 922467.00 frames. 2024-08-12 08:05:59,588 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on SV_voxceleb1: loss=0.004943, beats_loss=0, ecapa_loss=0.0004943, whisper_loss=0, over 939242.00 frames. 2024-08-12 08:07:52,931 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on AT_audioset: loss=0.02436, beats_loss=0.02436, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 08:07:52,935 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 08:07:53,321 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 08:08:00,471 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-12 08:08:13,252 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 08:08:15,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1539360.0, ans=0.0 2024-08-12 08:08:52,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1539660.0, ans=0.125 2024-08-12 08:09:00,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1539660.0, ans=0.2 2024-08-12 08:09:05,581 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9050, loss[loss=0.1142, beats_loss=0.009074, ecapa_loss=0.0001992, whisper_loss=0.1031, over 16988.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01101, ecapa_loss=0.0001812, whisper_loss=0.09254, over 3871661.88 frames. ], batch size: 66, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:09:18,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1539760.0, ans=10.0 2024-08-12 08:09:22,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1539860.0, ans=0.025 2024-08-12 08:09:39,645 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 08:09:41,308 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-12 08:09:41,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1539960.0, ans=0.125 2024-08-12 08:09:50,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1540060.0, ans=0.0 2024-08-12 08:09:58,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1540060.0, ans=0.1 2024-08-12 08:10:09,419 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.551e+01 2.907e+01 3.420e+01 5.824e+01, threshold=5.813e+01, percent-clipped=1.0 2024-08-12 08:10:10,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1540160.0, ans=0.1 2024-08-12 08:10:14,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1540160.0, ans=0.1 2024-08-12 08:10:16,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1540160.0, ans=0.025 2024-08-12 08:10:19,756 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9100, loss[loss=0.09962, beats_loss=0.009463, ecapa_loss=0.0001759, whisper_loss=0.0884, over 15555.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01095, ecapa_loss=0.000182, whisper_loss=0.09266, over 3867196.79 frames. ], batch size: 60, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:11:17,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1540660.0, ans=0.5 2024-08-12 08:11:27,531 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 08:11:31,943 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 08:11:33,174 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9150, loss[loss=0.1109, beats_loss=0.01049, ecapa_loss=0.0002181, whisper_loss=0.09819, over 14843.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01096, ecapa_loss=0.0001816, whisper_loss=0.09268, over 3874155.67 frames. ], batch size: 62, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:12:01,998 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-08-12 08:12:15,227 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2024-08-12 08:12:19,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1541060.0, ans=0.2 2024-08-12 08:12:21,825 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-12 08:12:24,550 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 08:12:24,960 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.001e-01 2024-08-12 08:12:28,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1541060.0, ans=0.125 2024-08-12 08:12:29,638 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2024-08-12 08:12:35,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1541160.0, ans=0.2 2024-08-12 08:12:35,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.613e+01 2.813e+01 3.154e+01 4.389e+01, threshold=5.626e+01, percent-clipped=0.0 2024-08-12 08:12:41,005 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=12.0 2024-08-12 08:12:45,120 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 08:12:46,211 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9200, loss[loss=0.0857, beats_loss=0.01267, ecapa_loss=0.0001589, whisper_loss=0.07144, over 15802.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01095, ecapa_loss=0.0001812, whisper_loss=0.09243, over 3872571.20 frames. ], batch size: 64, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:12:46,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1541260.0, ans=0.125 2024-08-12 08:12:53,568 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 08:12:58,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1541260.0, ans=0.0 2024-08-12 08:12:59,313 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 08:13:06,437 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 31 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 08:13:56,634 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 22 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-12 08:13:57,700 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9250, loss[loss=0.1185, beats_loss=0.007601, ecapa_loss=0.000226, whisper_loss=0.1087, over 14301.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01097, ecapa_loss=0.0001813, whisper_loss=0.0922, over 3886428.08 frames. ], batch size: 55, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:14:02,461 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 08:14:12,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1541860.0, ans=0.125 2024-08-12 08:14:12,739 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.92 vs. limit=22.5 2024-08-12 08:14:12,894 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.82 vs. limit=22.5 2024-08-12 08:14:28,437 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 08:14:35,758 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 08:14:35,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1541960.0, ans=0.0 2024-08-12 08:14:57,892 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-12 08:15:00,717 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.527e+01 2.843e+01 3.278e+01 5.057e+01, threshold=5.687e+01, percent-clipped=0.0 2024-08-12 08:15:11,013 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9300, loss[loss=0.08449, beats_loss=0.01265, ecapa_loss=0.0001676, whisper_loss=0.07017, over 22582.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01093, ecapa_loss=0.0001812, whisper_loss=0.09215, over 3874559.03 frames. ], batch size: 90, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:15:15,547 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 08:15:39,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.04 vs. limit=15.0 2024-08-12 08:15:45,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1542460.0, ans=0.125 2024-08-12 08:15:50,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1542460.0, ans=0.0 2024-08-12 08:15:51,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1542460.0, ans=0.04949747468305833 2024-08-12 08:15:54,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1542560.0, ans=0.0 2024-08-12 08:16:10,858 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 08:16:23,424 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 08:16:26,547 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9350, loss[loss=0.1148, beats_loss=0.01075, ecapa_loss=0.0002293, whisper_loss=0.1018, over 21016.00 frames. ], tot_loss[loss=0.105, beats_loss=0.011, ecapa_loss=0.0001814, whisper_loss=0.09214, over 3859019.77 frames. ], batch size: 90, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:16:33,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1542760.0, ans=0.0 2024-08-12 08:16:39,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1542760.0, ans=0.2 2024-08-12 08:17:00,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1542960.0, ans=0.2 2024-08-12 08:17:14,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1543060.0, ans=0.0 2024-08-12 08:17:16,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2024-08-12 08:17:19,262 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=12.0 2024-08-12 08:17:24,845 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=15.0 2024-08-12 08:17:31,953 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.553e+01 2.819e+01 3.364e+01 6.243e+01, threshold=5.639e+01, percent-clipped=2.0 2024-08-12 08:17:34,828 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 08:17:35,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1543160.0, ans=0.0 2024-08-12 08:17:36,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1543160.0, ans=0.0 2024-08-12 08:17:43,091 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9400, loss[loss=0.06849, beats_loss=0.0124, ecapa_loss=0.0001543, whisper_loss=0.05455, over 18110.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01109, ecapa_loss=0.0001813, whisper_loss=0.09137, over 3873398.08 frames. ], batch size: 74, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:17:48,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1543260.0, ans=0.125 2024-08-12 08:17:52,526 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 08:17:53,375 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-12 08:17:54,062 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 08:17:59,881 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-12 08:18:21,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1543460.0, ans=0.125 2024-08-12 08:18:34,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1543560.0, ans=0.125 2024-08-12 08:18:39,065 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 08:18:58,816 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9450, loss[loss=0.1439, beats_loss=0.008073, ecapa_loss=0.0001847, whisper_loss=0.1339, over 23561.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01112, ecapa_loss=0.0001796, whisper_loss=0.09134, over 3899032.44 frames. ], batch size: 88, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:19:00,670 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-12 08:19:09,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1543760.0, ans=0.0 2024-08-12 08:19:10,847 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 35 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 08:19:21,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1543860.0, ans=0.125 2024-08-12 08:19:39,823 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 30 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 08:20:01,423 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.483e+01 2.821e+01 3.317e+01 4.965e+01, threshold=5.642e+01, percent-clipped=0.0 2024-08-12 08:20:07,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1544160.0, ans=0.0 2024-08-12 08:20:11,669 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9500, loss[loss=0.12, beats_loss=0.00766, ecapa_loss=0.0002107, whisper_loss=0.1102, over 19178.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01103, ecapa_loss=0.00018, whisper_loss=0.09225, over 3891829.67 frames. ], batch size: 78, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:20:21,408 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 08:20:24,291 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 08:20:42,627 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-08-12 08:20:59,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1544560.0, ans=0.0 2024-08-12 08:21:02,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1544560.0, ans=0.1 2024-08-12 08:21:06,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-08-12 08:21:10,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1544660.0, ans=0.2 2024-08-12 08:21:13,214 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=22.5 2024-08-12 08:21:24,198 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9550, loss[loss=0.09839, beats_loss=0.008342, ecapa_loss=0.0002183, whisper_loss=0.08786, over 15071.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01105, ecapa_loss=0.0001799, whisper_loss=0.09145, over 3867722.06 frames. ], batch size: 61, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:21:30,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1544760.0, ans=0.0 2024-08-12 08:21:51,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1544860.0, ans=0.2 2024-08-12 08:22:15,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1545060.0, ans=0.125 2024-08-12 08:22:26,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.581e+01 2.910e+01 3.415e+01 4.856e+01, threshold=5.819e+01, percent-clipped=0.0 2024-08-12 08:22:36,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9600, loss[loss=0.1261, beats_loss=0.009185, ecapa_loss=0.0001482, whisper_loss=0.1154, over 24239.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01108, ecapa_loss=0.000178, whisper_loss=0.09136, over 3841209.57 frames. ], batch size: 90, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:22:38,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2024-08-12 08:22:55,798 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 21 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-12 08:23:12,621 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 08:23:23,340 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 08:23:30,272 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 08:23:42,198 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 23 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-12 08:23:49,498 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9650, loss[loss=0.08837, beats_loss=0.008761, ecapa_loss=0.0002215, whisper_loss=0.0774, over 17602.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01107, ecapa_loss=0.0001805, whisper_loss=0.09128, over 3840372.51 frames. ], batch size: 72, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:23:56,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1545760.0, ans=0.025 2024-08-12 08:24:27,753 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 08:24:28,152 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.721e-03 2024-08-12 08:24:47,718 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 08:24:47,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1546160.0, ans=0.1 2024-08-12 08:24:50,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.490e+01 2.776e+01 3.280e+01 4.565e+01, threshold=5.551e+01, percent-clipped=0.0 2024-08-12 08:24:54,464 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2024-08-12 08:25:00,794 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9700, loss[loss=0.1169, beats_loss=0.009433, ecapa_loss=0.0001922, whisper_loss=0.1055, over 15583.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01102, ecapa_loss=0.0001808, whisper_loss=0.09147, over 3837987.57 frames. ], batch size: 60, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:25:24,686 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 08:25:32,647 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 08:25:39,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1546460.0, ans=0.2 2024-08-12 08:25:49,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1546560.0, ans=0.1 2024-08-12 08:26:10,547 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 08:26:13,097 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9750, loss[loss=0.09381, beats_loss=0.009968, ecapa_loss=0.0002241, whisper_loss=0.0816, over 14375.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01105, ecapa_loss=0.0001801, whisper_loss=0.09164, over 3859087.87 frames. ], batch size: 63, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:26:24,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1546760.0, ans=0.125 2024-08-12 08:26:43,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1546960.0, ans=0.125 2024-08-12 08:27:16,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.437e+01 2.801e+01 3.445e+01 6.244e+01, threshold=5.602e+01, percent-clipped=1.0 2024-08-12 08:27:24,490 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 22 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-12 08:27:27,296 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9800, loss[loss=0.1063, beats_loss=0.009549, ecapa_loss=0.0001794, whisper_loss=0.09496, over 13927.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01103, ecapa_loss=0.0001791, whisper_loss=0.09222, over 3827685.97 frames. ], batch size: 53, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:27:28,891 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 08:27:46,946 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 08:27:47,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1547360.0, ans=0.2 2024-08-12 08:28:25,749 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 08:28:30,308 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 08:28:38,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1547660.0, ans=0.125 2024-08-12 08:28:42,399 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9850, loss[loss=0.1237, beats_loss=0.007266, ecapa_loss=0.0001764, whisper_loss=0.1146, over 14716.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01101, ecapa_loss=0.0001785, whisper_loss=0.09279, over 3854476.94 frames. ], batch size: 54, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:28:47,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1547760.0, ans=0.0 2024-08-12 08:28:49,068 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2024-08-12 08:29:08,500 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 08:29:10,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1547860.0, ans=0.125 2024-08-12 08:29:11,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1547960.0, ans=0.1 2024-08-12 08:29:17,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1547960.0, ans=0.125 2024-08-12 08:29:22,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1547960.0, ans=0.125 2024-08-12 08:29:27,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1548060.0, ans=0.0 2024-08-12 08:29:37,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1548060.0, ans=0.1 2024-08-12 08:29:40,442 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 33 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 08:29:42,094 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 08:29:42,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=1548160.0, ans=0.2 2024-08-12 08:29:47,657 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.546e+01 2.857e+01 3.272e+01 5.247e+01, threshold=5.713e+01, percent-clipped=0.0 2024-08-12 08:29:48,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1548160.0, ans=0.0 2024-08-12 08:29:49,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1548160.0, ans=0.1 2024-08-12 08:29:57,839 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9900, loss[loss=0.1086, beats_loss=0.0118, ecapa_loss=0.0001754, whisper_loss=0.09501, over 22844.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01104, ecapa_loss=0.0001786, whisper_loss=0.0922, over 3865484.19 frames. ], batch size: 92, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:29:59,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1548260.0, ans=0.125 2024-08-12 08:30:00,844 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 08:30:02,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1548260.0, ans=0.2 2024-08-12 08:30:09,151 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.06 vs. limit=15.0 2024-08-12 08:30:20,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1548360.0, ans=0.2 2024-08-12 08:30:29,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1548460.0, ans=0.125 2024-08-12 08:30:36,640 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.29 vs. limit=22.5 2024-08-12 08:30:57,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1548660.0, ans=0.125 2024-08-12 08:31:05,295 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 33 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 08:31:08,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1548660.0, ans=0.125 2024-08-12 08:31:10,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1548760.0, ans=0.125 2024-08-12 08:31:10,890 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 9950, loss[loss=0.09708, beats_loss=0.01117, ecapa_loss=0.0001585, whisper_loss=0.08433, over 23429.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01108, ecapa_loss=0.0001796, whisper_loss=0.09136, over 3849517.15 frames. ], batch size: 93, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:31:28,045 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.11 vs. limit=10.0 2024-08-12 08:31:49,972 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 08:31:51,431 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 08:32:10,378 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 08:32:13,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1549160.0, ans=0.125 2024-08-12 08:32:14,673 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.498e+01 2.780e+01 3.249e+01 5.152e+01, threshold=5.559e+01, percent-clipped=0.0 2024-08-12 08:32:24,584 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10000, loss[loss=0.0877, beats_loss=0.01029, ecapa_loss=0.0001953, whisper_loss=0.07545, over 19346.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01111, ecapa_loss=0.0001784, whisper_loss=0.09173, over 3876655.72 frames. ], batch size: 79, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:32:28,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1549260.0, ans=0.125 2024-08-12 08:32:29,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-12 08:32:41,211 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 08:32:41,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1549360.0, ans=0.0 2024-08-12 08:33:03,932 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 08:33:38,545 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10050, loss[loss=0.1322, beats_loss=0.009502, ecapa_loss=0.0001677, whisper_loss=0.121, over 22106.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01109, ecapa_loss=0.0001783, whisper_loss=0.09176, over 3891622.41 frames. ], batch size: 84, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:33:43,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1549760.0, ans=0.0 2024-08-12 08:33:43,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1549760.0, ans=0.125 2024-08-12 08:33:46,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1549760.0, ans=0.0 2024-08-12 08:33:59,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1549860.0, ans=0.125 2024-08-12 08:33:59,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1549860.0, ans=0.5 2024-08-12 08:34:05,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1549860.0, ans=0.2 2024-08-12 08:34:08,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1549960.0, ans=0.2 2024-08-12 08:34:22,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1550060.0, ans=0.0 2024-08-12 08:34:24,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1550060.0, ans=0.125 2024-08-12 08:34:29,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1550060.0, ans=0.0 2024-08-12 08:34:40,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1550160.0, ans=0.125 2024-08-12 08:34:40,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.497e+01 2.870e+01 3.338e+01 7.482e+01, threshold=5.741e+01, percent-clipped=1.0 2024-08-12 08:34:48,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1550160.0, ans=0.0 2024-08-12 08:34:50,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1550260.0, ans=0.0 2024-08-12 08:34:51,425 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10100, loss[loss=0.1024, beats_loss=0.01142, ecapa_loss=0.0001849, whisper_loss=0.08914, over 22754.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01115, ecapa_loss=0.0001796, whisper_loss=0.09141, over 3909691.24 frames. ], batch size: 93, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:35:18,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1550360.0, ans=0.125 2024-08-12 08:36:04,970 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10150, loss[loss=0.131, beats_loss=0.009259, ecapa_loss=0.0002304, whisper_loss=0.1194, over 18671.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01107, ecapa_loss=0.0001811, whisper_loss=0.0922, over 3918517.44 frames. ], batch size: 77, lr: 5.68e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:36:23,909 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2024-08-12 08:36:27,850 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 08:36:34,061 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2024-08-12 08:36:45,764 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 08:37:03,454 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:37:08,071 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2024-08-12 08:37:10,139 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.557e+01 2.799e+01 3.287e+01 1.688e+02, threshold=5.598e+01, percent-clipped=1.0 2024-08-12 08:37:16,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1551160.0, ans=0.1 2024-08-12 08:37:21,581 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10200, loss[loss=0.09287, beats_loss=0.01011, ecapa_loss=0.0002155, whisper_loss=0.0806, over 21704.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01109, ecapa_loss=0.0001807, whisper_loss=0.09213, over 3912410.44 frames. ], batch size: 93, lr: 5.68e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:37:26,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1551260.0, ans=0.0 2024-08-12 08:37:27,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2024-08-12 08:37:36,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1551360.0, ans=0.5 2024-08-12 08:37:36,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1551360.0, ans=0.125 2024-08-12 08:37:49,808 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 08:38:05,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1551460.0, ans=0.125 2024-08-12 08:38:08,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1551560.0, ans=0.0 2024-08-12 08:38:12,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1551560.0, ans=0.0 2024-08-12 08:38:14,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1551560.0, ans=0.1 2024-08-12 08:38:24,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1551660.0, ans=0.1 2024-08-12 08:38:38,558 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10250, loss[loss=0.115, beats_loss=0.009668, ecapa_loss=0.0001769, whisper_loss=0.1036, over 14179.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01107, ecapa_loss=0.0001806, whisper_loss=0.09219, over 3851301.60 frames. ], batch size: 54, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:38:43,714 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-12 08:38:48,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1551760.0, ans=0.0 2024-08-12 08:39:43,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1552160.0, ans=0.09899494936611666 2024-08-12 08:39:46,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.10 vs. limit=6.0 2024-08-12 08:39:46,436 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.422e+01 2.707e+01 3.104e+01 5.382e+01, threshold=5.414e+01, percent-clipped=0.0 2024-08-12 08:39:47,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1552160.0, ans=0.0 2024-08-12 08:39:57,305 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10300, loss[loss=0.119, beats_loss=0.01093, ecapa_loss=0.0001859, whisper_loss=0.1062, over 22456.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01107, ecapa_loss=0.0001793, whisper_loss=0.0922, over 3861657.42 frames. ], batch size: 90, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:40:07,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1552260.0, ans=0.0 2024-08-12 08:40:11,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1552360.0, ans=10.0 2024-08-12 08:40:19,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-08-12 08:40:29,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1552460.0, ans=0.0 2024-08-12 08:40:48,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1552560.0, ans=0.0 2024-08-12 08:41:01,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1552660.0, ans=0.125 2024-08-12 08:41:13,798 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10350, loss[loss=0.1247, beats_loss=0.01044, ecapa_loss=0.0002116, whisper_loss=0.1121, over 22513.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01106, ecapa_loss=0.0001796, whisper_loss=0.09172, over 3892321.65 frames. ], batch size: 90, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:41:19,810 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 08:41:22,303 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 08:41:28,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1552860.0, ans=0.0 2024-08-12 08:41:31,609 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 08:41:39,776 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=15.0 2024-08-12 08:41:48,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1552960.0, ans=0.0 2024-08-12 08:41:56,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1552960.0, ans=0.125 2024-08-12 08:42:06,374 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2024-08-12 08:42:07,506 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-12 08:42:17,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.587e+01 2.793e+01 3.199e+01 6.798e+01, threshold=5.587e+01, percent-clipped=1.0 2024-08-12 08:42:26,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1553260.0, ans=0.1 2024-08-12 08:42:27,520 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10400, loss[loss=0.1114, beats_loss=0.01079, ecapa_loss=0.0002039, whisper_loss=0.09858, over 19433.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01105, ecapa_loss=0.0001804, whisper_loss=0.0927, over 3892098.21 frames. ], batch size: 80, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:42:29,318 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 08:42:31,858 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2024-08-12 08:42:38,584 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 11 from Vox, 52 fro AS 2024-08-12 08:42:43,908 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-08-12 08:42:44,343 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 08:42:45,248 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.16 vs. limit=5.0 2024-08-12 08:42:55,164 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-12 08:43:05,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1553460.0, ans=0.125 2024-08-12 08:43:16,002 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 08:43:34,744 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 08:43:38,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1553660.0, ans=0.125 2024-08-12 08:43:43,256 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10450, loss[loss=0.1027, beats_loss=0.01086, ecapa_loss=0.0001883, whisper_loss=0.08994, over 22422.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01111, ecapa_loss=0.0001784, whisper_loss=0.09189, over 3882272.66 frames. ], batch size: 92, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:43:46,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1553760.0, ans=0.0 2024-08-12 08:43:50,508 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.84 vs. limit=22.5 2024-08-12 08:43:52,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1553760.0, ans=0.0 2024-08-12 08:43:52,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1553760.0, ans=0.125 2024-08-12 08:44:06,582 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 08:44:17,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1553960.0, ans=0.125 2024-08-12 08:44:20,499 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.22 vs. limit=22.5 2024-08-12 08:44:32,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1554060.0, ans=0.1 2024-08-12 08:44:35,646 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 08:44:48,705 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.496e+01 2.841e+01 3.416e+01 4.859e+01, threshold=5.681e+01, percent-clipped=0.0 2024-08-12 08:44:56,615 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 08:44:56,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1554160.0, ans=0.0 2024-08-12 08:44:56,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1554160.0, ans=0.125 2024-08-12 08:44:59,533 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10500, loss[loss=0.09037, beats_loss=0.01135, ecapa_loss=0.0002019, whisper_loss=0.077, over 16184.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01109, ecapa_loss=0.0001772, whisper_loss=0.09222, over 3886094.61 frames. ], batch size: 67, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:45:09,935 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 08:45:22,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1554360.0, ans=0.0 2024-08-12 08:45:24,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=15.0 2024-08-12 08:45:49,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1554560.0, ans=0.04949747468305833 2024-08-12 08:45:51,828 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 08:45:53,995 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-08-12 08:45:54,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1554560.0, ans=0.2 2024-08-12 08:46:03,279 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=12.0 2024-08-12 08:46:12,909 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10550, loss[loss=0.1259, beats_loss=0.007168, ecapa_loss=0.0002293, whisper_loss=0.1165, over 22599.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01108, ecapa_loss=0.0001775, whisper_loss=0.09221, over 3897529.24 frames. ], batch size: 93, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:46:16,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1554760.0, ans=0.0 2024-08-12 08:46:18,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1554760.0, ans=15.0 2024-08-12 08:46:21,458 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 08:46:23,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2024-08-12 08:46:47,173 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 08:47:02,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1555060.0, ans=0.1 2024-08-12 08:47:08,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1555060.0, ans=0.0 2024-08-12 08:47:18,760 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.542e+01 2.754e+01 3.046e+01 4.371e+01, threshold=5.507e+01, percent-clipped=0.0 2024-08-12 08:47:23,613 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 08:47:29,391 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10600, loss[loss=0.1153, beats_loss=0.008166, ecapa_loss=0.0002192, whisper_loss=0.1049, over 19902.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01103, ecapa_loss=0.0001776, whisper_loss=0.09203, over 3888946.87 frames. ], batch size: 81, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:47:51,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1555360.0, ans=0.1 2024-08-12 08:47:52,021 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 08:47:55,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1555360.0, ans=0.125 2024-08-12 08:48:16,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1555560.0, ans=0.0 2024-08-12 08:48:24,399 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 08:48:32,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1555660.0, ans=0.125 2024-08-12 08:48:38,937 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 32 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 08:48:43,069 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10650, loss[loss=0.1027, beats_loss=0.01125, ecapa_loss=0.0001556, whisper_loss=0.08993, over 18084.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01102, ecapa_loss=0.0001786, whisper_loss=0.09168, over 3840409.49 frames. ], batch size: 68, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:48:44,758 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 08:48:48,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1555760.0, ans=0.125 2024-08-12 08:48:58,757 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 08:49:18,978 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 08:49:36,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1556060.0, ans=0.125 2024-08-12 08:49:46,271 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 08:49:47,629 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.652e+01 2.957e+01 3.394e+01 5.576e+01, threshold=5.914e+01, percent-clipped=1.0 2024-08-12 08:49:58,764 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10700, loss[loss=0.1234, beats_loss=0.00832, ecapa_loss=0.0001753, whisper_loss=0.1134, over 21270.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01101, ecapa_loss=0.000177, whisper_loss=0.0915, over 3855088.33 frames. ], batch size: 81, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:50:03,555 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 08:50:08,682 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-08-12 08:50:11,307 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 08:50:26,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1556360.0, ans=0.2 2024-08-12 08:50:34,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1556460.0, ans=0.125 2024-08-12 08:50:57,225 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=12.0 2024-08-12 08:50:59,346 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 08:51:01,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1556660.0, ans=0.2 2024-08-12 08:51:12,897 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10750, loss[loss=0.1015, beats_loss=0.01091, ecapa_loss=0.000161, whisper_loss=0.089, over 18914.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.011, ecapa_loss=0.0001784, whisper_loss=0.09199, over 3878905.98 frames. ], batch size: 74, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:51:16,584 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.72 vs. limit=10.0 2024-08-12 08:51:17,827 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 08:51:20,385 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 08:51:27,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1556860.0, ans=0.0 2024-08-12 08:51:32,170 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 08:51:50,520 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 08:51:56,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1557060.0, ans=0.0 2024-08-12 08:52:00,939 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 08:52:02,120 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 36 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 08:52:08,212 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 08:52:17,028 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.513e+01 2.826e+01 3.158e+01 5.993e+01, threshold=5.652e+01, percent-clipped=1.0 2024-08-12 08:52:19,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1557160.0, ans=0.0 2024-08-12 08:52:20,307 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 08:52:27,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1557260.0, ans=0.2 2024-08-12 08:52:28,017 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10800, loss[loss=0.09898, beats_loss=0.01148, ecapa_loss=0.0001669, whisper_loss=0.08583, over 21364.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01097, ecapa_loss=0.0001789, whisper_loss=0.09292, over 3890754.76 frames. ], batch size: 84, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:52:44,853 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 38 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 08:52:49,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1557360.0, ans=0.0 2024-08-12 08:53:32,621 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 08:53:38,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1557660.0, ans=0.1 2024-08-12 08:53:42,613 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10850, loss[loss=0.1218, beats_loss=0.009379, ecapa_loss=0.0002168, whisper_loss=0.1102, over 19585.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01098, ecapa_loss=0.00018, whisper_loss=0.0933, over 3913164.65 frames. ], batch size: 84, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:54:17,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1557960.0, ans=0.0 2024-08-12 08:54:26,380 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-12 08:54:39,006 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 08:54:47,769 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.594e+01 2.957e+01 3.345e+01 7.139e+01, threshold=5.915e+01, percent-clipped=2.0 2024-08-12 08:54:50,612 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 08:54:55,591 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-12 08:54:59,326 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10900, loss[loss=0.1043, beats_loss=0.01146, ecapa_loss=0.0001741, whisper_loss=0.09105, over 18655.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.011, ecapa_loss=0.0001783, whisper_loss=0.0936, over 3919024.40 frames. ], batch size: 73, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:55:08,305 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:55:45,237 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-12 08:55:48,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1558560.0, ans=0.125 2024-08-12 08:56:02,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1558660.0, ans=0.0 2024-08-12 08:56:04,982 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-12 08:56:15,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2024-08-12 08:56:16,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1558660.0, ans=0.125 2024-08-12 08:56:18,973 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 10950, loss[loss=0.1028, beats_loss=0.01109, ecapa_loss=0.0001708, whisper_loss=0.08996, over 21692.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01099, ecapa_loss=0.0001805, whisper_loss=0.09337, over 3913934.22 frames. ], batch size: 88, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:56:22,211 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-12 08:56:45,062 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.59 vs. limit=15.0 2024-08-12 08:56:46,556 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=15.0 2024-08-12 08:56:47,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1558860.0, ans=0.125 2024-08-12 08:56:51,245 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 08:57:06,569 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 08:57:19,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1559060.0, ans=0.2 2024-08-12 08:57:36,537 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.556e+01 2.763e+01 3.156e+01 4.815e+01, threshold=5.526e+01, percent-clipped=0.0 2024-08-12 08:57:50,297 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11000, loss[loss=0.09725, beats_loss=0.01377, ecapa_loss=0.0001368, whisper_loss=0.08211, over 16056.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01093, ecapa_loss=0.0001811, whisper_loss=0.0935, over 3926814.62 frames. ], batch size: 66, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:57:55,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1559260.0, ans=0.1 2024-08-12 08:57:55,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1559260.0, ans=0.125 2024-08-12 08:58:03,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1559260.0, ans=0.2 2024-08-12 08:58:12,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1559360.0, ans=0.0 2024-08-12 08:58:13,067 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2024-08-12 08:58:32,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2024-08-12 08:58:36,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1559460.0, ans=0.125 2024-08-12 08:58:40,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1559560.0, ans=0.125 2024-08-12 08:58:42,253 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.41 vs. limit=15.0 2024-08-12 08:58:43,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1559560.0, ans=0.0 2024-08-12 08:58:43,567 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.42 vs. limit=22.5 2024-08-12 08:58:44,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1559560.0, ans=0.2 2024-08-12 08:58:48,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1559560.0, ans=0.0 2024-08-12 08:58:53,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1559660.0, ans=0.125 2024-08-12 08:58:59,740 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 08:59:13,512 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11050, loss[loss=0.09223, beats_loss=0.01071, ecapa_loss=0.0001721, whisper_loss=0.0798, over 14015.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01102, ecapa_loss=0.0001794, whisper_loss=0.09285, over 3913636.25 frames. ], batch size: 55, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:59:18,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1559760.0, ans=0.125 2024-08-12 08:59:21,923 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 08:59:31,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1559760.0, ans=0.125 2024-08-12 08:59:35,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1559860.0, ans=0.125 2024-08-12 08:59:40,391 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2024-08-12 08:59:41,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1559860.0, ans=0.07 2024-08-12 08:59:49,387 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 08:59:50,041 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2024-08-12 09:00:01,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1559960.0, ans=0.2 2024-08-12 09:00:27,432 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-12 09:00:32,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1560060.0, ans=0.125 2024-08-12 09:00:43,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1560160.0, ans=0.0 2024-08-12 09:00:49,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.392e+01 2.745e+01 3.211e+01 4.714e+01, threshold=5.490e+01, percent-clipped=0.0 2024-08-12 09:01:04,731 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11100, loss[loss=0.108, beats_loss=0.0106, ecapa_loss=0.0001735, whisper_loss=0.09565, over 15348.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01109, ecapa_loss=0.0001793, whisper_loss=0.09215, over 3896035.94 frames. ], batch size: 60, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:01:08,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1560260.0, ans=0.125 2024-08-12 09:01:33,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1560360.0, ans=0.125 2024-08-12 09:01:39,007 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 24 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 09:02:43,215 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 09:02:59,252 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11150, loss[loss=0.1087, beats_loss=0.009962, ecapa_loss=0.000238, whisper_loss=0.09635, over 15093.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01105, ecapa_loss=0.0001791, whisper_loss=0.09208, over 3898631.06 frames. ], batch size: 62, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:03:14,105 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-12 09:03:26,935 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.05 vs. limit=12.0 2024-08-12 09:03:29,706 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 09:03:31,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1560860.0, ans=0.0 2024-08-12 09:03:39,177 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 27 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 09:04:01,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1560960.0, ans=0.125 2024-08-12 09:04:02,943 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 09:04:29,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.591e+01 2.914e+01 3.431e+01 1.120e+02, threshold=5.828e+01, percent-clipped=1.0 2024-08-12 09:04:40,139 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11200, loss[loss=0.09501, beats_loss=0.01063, ecapa_loss=0.0001811, whisper_loss=0.08257, over 22624.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01099, ecapa_loss=0.0001805, whisper_loss=0.09275, over 3892155.71 frames. ], batch size: 94, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:04:51,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1561260.0, ans=0.1 2024-08-12 09:04:56,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1561360.0, ans=0.125 2024-08-12 09:05:01,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1561360.0, ans=0.125 2024-08-12 09:05:24,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1561560.0, ans=0.125 2024-08-12 09:05:30,587 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2024-08-12 09:05:31,671 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 09:05:43,491 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=15.0 2024-08-12 09:05:55,282 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11250, loss[loss=0.107, beats_loss=0.0109, ecapa_loss=0.0001974, whisper_loss=0.09417, over 22786.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01101, ecapa_loss=0.0001801, whisper_loss=0.09301, over 3895957.71 frames. ], batch size: 93, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:06:20,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1561860.0, ans=10.0 2024-08-12 09:06:33,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1561960.0, ans=0.1 2024-08-12 09:06:45,894 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.353e+05 2024-08-12 09:07:00,577 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.463e+01 2.812e+01 3.090e+01 4.861e+01, threshold=5.624e+01, percent-clipped=0.0 2024-08-12 09:07:10,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1562260.0, ans=0.125 2024-08-12 09:07:12,184 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11300, loss[loss=0.1091, beats_loss=0.008244, ecapa_loss=0.000192, whisper_loss=0.09895, over 16539.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01097, ecapa_loss=0.0001795, whisper_loss=0.09256, over 3898683.31 frames. ], batch size: 66, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:07:18,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1562260.0, ans=0.1 2024-08-12 09:07:26,802 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 09:07:37,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1562360.0, ans=0.125 2024-08-12 09:07:44,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1562460.0, ans=0.0 2024-08-12 09:07:48,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1562460.0, ans=0.025 2024-08-12 09:08:12,681 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:08:19,422 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-08-12 09:08:22,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1562660.0, ans=0.125 2024-08-12 09:08:27,141 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11350, loss[loss=0.1151, beats_loss=0.009769, ecapa_loss=0.0001949, whisper_loss=0.1034, over 21555.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01095, ecapa_loss=0.0001784, whisper_loss=0.09234, over 3892876.77 frames. ], batch size: 88, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:08:32,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1562760.0, ans=0.0 2024-08-12 09:08:51,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1562860.0, ans=0.2 2024-08-12 09:08:51,688 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.02 vs. limit=8.0 2024-08-12 09:08:57,598 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2024-08-12 09:09:04,079 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 31 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 09:09:05,565 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 09:09:07,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1562960.0, ans=0.07 2024-08-12 09:09:16,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1563060.0, ans=0.2 2024-08-12 09:09:32,436 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.664e+01 2.943e+01 3.527e+01 6.465e+01, threshold=5.886e+01, percent-clipped=3.0 2024-08-12 09:09:43,105 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11400, loss[loss=0.08003, beats_loss=0.01355, ecapa_loss=0.000147, whisper_loss=0.06501, over 14964.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01091, ecapa_loss=0.0001783, whisper_loss=0.09304, over 3878133.12 frames. ], batch size: 60, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:10:17,549 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=12.0 2024-08-12 09:10:18,507 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 09:10:36,216 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 09:10:38,431 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 09:10:51,311 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-12 09:10:58,671 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11450, loss[loss=0.1041, beats_loss=0.01162, ecapa_loss=0.0001298, whisper_loss=0.09122, over 21573.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01102, ecapa_loss=0.0001774, whisper_loss=0.09243, over 3893573.42 frames. ], batch size: 80, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:11:09,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1563760.0, ans=0.0 2024-08-12 09:11:13,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1563860.0, ans=0.125 2024-08-12 09:11:23,459 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 26 from LS+wenet, 7 from Vox, 28 fro AS 2024-08-12 09:11:51,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1564060.0, ans=0.125 2024-08-12 09:12:01,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.222e+01 2.691e+01 2.984e+01 3.648e+01 5.377e+01, threshold=5.967e+01, percent-clipped=0.0 2024-08-12 09:12:12,745 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11500, loss[loss=0.116, beats_loss=0.01049, ecapa_loss=0.0001322, whisper_loss=0.1042, over 19680.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.011, ecapa_loss=0.0001761, whisper_loss=0.09248, over 3877146.97 frames. ], batch size: 75, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:12:32,519 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-12 09:12:37,185 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 09:12:47,264 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 14 from Vox, 52 fro AS 2024-08-12 09:13:02,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1564560.0, ans=0.09899494936611666 2024-08-12 09:13:10,023 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2024-08-12 09:13:17,690 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 32 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 09:13:25,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1564760.0, ans=0.1 2024-08-12 09:13:26,649 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11550, loss[loss=0.1097, beats_loss=0.0102, ecapa_loss=0.0001485, whisper_loss=0.09798, over 22611.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01105, ecapa_loss=0.0001772, whisper_loss=0.09219, over 3870627.73 frames. ], batch size: 85, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:13:33,279 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 09:13:36,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.20 vs. limit=6.0 2024-08-12 09:14:20,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1565060.0, ans=0.1 2024-08-12 09:14:31,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1565160.0, ans=0.0 2024-08-12 09:14:32,110 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.539e+01 2.783e+01 3.251e+01 6.274e+01, threshold=5.566e+01, percent-clipped=2.0 2024-08-12 09:14:35,869 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.94 vs. limit=22.5 2024-08-12 09:14:39,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1565160.0, ans=0.0 2024-08-12 09:14:41,853 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11600, loss[loss=0.1111, beats_loss=0.009429, ecapa_loss=0.0002245, whisper_loss=0.09941, over 21566.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01108, ecapa_loss=0.0001774, whisper_loss=0.09188, over 3899433.36 frames. ], batch size: 89, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:14:54,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1565260.0, ans=0.0 2024-08-12 09:15:41,873 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 09:15:52,883 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11650, loss[loss=0.1053, beats_loss=0.01128, ecapa_loss=0.0001501, whisper_loss=0.09249, over 23897.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01102, ecapa_loss=0.0001772, whisper_loss=0.09254, over 3898573.27 frames. ], batch size: 95, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:16:00,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1565760.0, ans=0.1 2024-08-12 09:16:12,763 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 09:16:26,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1565960.0, ans=0.1 2024-08-12 09:16:28,454 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 17 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 09:16:51,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1566160.0, ans=0.125 2024-08-12 09:16:53,050 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.448e+01 2.832e+01 3.122e+01 7.544e+01, threshold=5.665e+01, percent-clipped=2.0 2024-08-12 09:16:56,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1566160.0, ans=0.125 2024-08-12 09:17:02,743 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=22.5 2024-08-12 09:17:03,087 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11700, loss[loss=0.12, beats_loss=0.01149, ecapa_loss=0.0001718, whisper_loss=0.1068, over 21091.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01108, ecapa_loss=0.0001765, whisper_loss=0.09237, over 3888818.00 frames. ], batch size: 82, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:17:04,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1566260.0, ans=0.125 2024-08-12 09:17:06,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1566260.0, ans=0.125 2024-08-12 09:17:24,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1566360.0, ans=0.0 2024-08-12 09:17:29,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1566460.0, ans=0.2 2024-08-12 09:17:33,311 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 09:17:33,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1566460.0, ans=0.07 2024-08-12 09:17:41,779 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 09:17:42,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1566460.0, ans=0.04949747468305833 2024-08-12 09:17:43,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1566560.0, ans=0.0 2024-08-12 09:17:55,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1566560.0, ans=0.0 2024-08-12 09:17:57,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1566660.0, ans=0.0 2024-08-12 09:18:05,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1566660.0, ans=0.1 2024-08-12 09:18:11,870 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11750, loss[loss=0.1065, beats_loss=0.01176, ecapa_loss=0.0001498, whisper_loss=0.09319, over 23133.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0112, ecapa_loss=0.0001761, whisper_loss=0.09212, over 3907216.57 frames. ], batch size: 89, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:18:20,860 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 09:18:23,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1566760.0, ans=0.1 2024-08-12 09:18:27,025 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2024-08-12 09:18:50,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1566960.0, ans=0.0 2024-08-12 09:18:55,914 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 26 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 09:19:12,861 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.551e+01 2.829e+01 3.227e+01 5.711e+01, threshold=5.658e+01, percent-clipped=1.0 2024-08-12 09:19:13,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1567160.0, ans=0.1 2024-08-12 09:19:22,600 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11800, loss[loss=0.1251, beats_loss=0.00657, ecapa_loss=0.0002071, whisper_loss=0.1165, over 17537.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01117, ecapa_loss=0.0001764, whisper_loss=0.09257, over 3909284.78 frames. ], batch size: 69, lr: 5.65e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:19:30,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1567260.0, ans=15.0 2024-08-12 09:19:35,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1567360.0, ans=0.125 2024-08-12 09:19:57,725 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:20:17,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1567660.0, ans=0.125 2024-08-12 09:20:17,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1567660.0, ans=0.125 2024-08-12 09:20:18,545 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.35 vs. limit=10.0 2024-08-12 09:20:23,481 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 38 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 09:20:31,506 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.48 vs. limit=22.5 2024-08-12 09:20:31,843 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11850, loss[loss=0.1046, beats_loss=0.01104, ecapa_loss=0.0001586, whisper_loss=0.09201, over 17480.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01119, ecapa_loss=0.0001764, whisper_loss=0.09339, over 3910002.73 frames. ], batch size: 69, lr: 5.65e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:20:32,070 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-12 09:20:44,778 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.08 vs. limit=10.0 2024-08-12 09:20:53,134 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-12 09:20:55,331 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 09:20:59,328 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 09:20:59,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1567960.0, ans=0.125 2024-08-12 09:21:17,108 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 22 from LS+wenet, 23 from Vox, 51 fro AS 2024-08-12 09:21:21,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1568060.0, ans=0.125 2024-08-12 09:21:27,508 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 21 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-12 09:21:28,790 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 14 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 09:21:31,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.485e+01 2.770e+01 3.068e+01 4.213e+01, threshold=5.539e+01, percent-clipped=0.0 2024-08-12 09:21:31,727 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-12 09:21:39,559 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11900, loss[loss=0.1007, beats_loss=0.01151, ecapa_loss=0.0002044, whisper_loss=0.08713, over 16210.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01121, ecapa_loss=0.0001773, whisper_loss=0.09248, over 3903557.89 frames. ], batch size: 69, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:21:47,236 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.19 vs. limit=22.5 2024-08-12 09:22:04,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1568360.0, ans=0.07 2024-08-12 09:22:43,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1568660.0, ans=0.125 2024-08-12 09:22:49,893 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 11950, loss[loss=0.07971, beats_loss=0.0118, ecapa_loss=0.0001515, whisper_loss=0.0664, over 18817.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01114, ecapa_loss=0.0001794, whisper_loss=0.09215, over 3911006.62 frames. ], batch size: 71, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:23:01,612 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 09:23:20,161 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-08-12 09:23:23,913 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-12 09:23:29,347 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 09:23:31,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1569060.0, ans=0.0 2024-08-12 09:23:34,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1569060.0, ans=0.2 2024-08-12 09:23:34,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1569060.0, ans=0.0 2024-08-12 09:23:38,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1569060.0, ans=0.125 2024-08-12 09:23:51,635 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.558e+01 2.859e+01 3.291e+01 5.466e+01, threshold=5.718e+01, percent-clipped=0.0 2024-08-12 09:24:00,133 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12000, loss[loss=0.09761, beats_loss=0.0141, ecapa_loss=0.0001645, whisper_loss=0.08187, over 21610.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01116, ecapa_loss=0.0001781, whisper_loss=0.09139, over 3862427.71 frames. ], batch size: 89, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:24:00,133 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 09:24:27,443 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.0886, 3.0608, 3.2468, 3.5670], device='cuda:1') 2024-08-12 09:24:39,978 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on ASR_libri: loss=0.2552, beats_loss=0, ecapa_loss=0.0006057, whisper_loss=0.2491, over 922467.00 frames. 2024-08-12 09:24:56,655 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on SV_voxceleb1: loss=0.004842, beats_loss=0, ecapa_loss=0.0004842, whisper_loss=0, over 939242.00 frames. 2024-08-12 09:25:23,656 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.4528, 2.0937, 2.5485, 2.8241], device='cuda:1') 2024-08-12 09:26:51,028 INFO [train_multi_KD3.py:1149] (1/4) Epoch 11, validation on AT_audioset: loss=0.02454, beats_loss=0.02454, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 09:26:51,032 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 09:27:02,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1569260.0, ans=0.125 2024-08-12 09:27:12,262 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2024-08-12 09:27:16,747 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 09:27:18,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1569460.0, ans=0.125 2024-08-12 09:27:24,761 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2024-08-12 09:27:30,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1569460.0, ans=0.0 2024-08-12 09:27:35,062 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-12 09:27:49,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1569660.0, ans=10.0 2024-08-12 09:27:50,076 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-12 09:27:52,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=12.0 2024-08-12 09:27:57,776 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 09:28:01,733 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12050, loss[loss=0.1151, beats_loss=0.01014, ecapa_loss=0.000151, whisper_loss=0.1035, over 23284.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01114, ecapa_loss=0.0001779, whisper_loss=0.09142, over 3854139.67 frames. ], batch size: 87, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:28:21,252 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 09:28:38,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1569960.0, ans=0.125 2024-08-12 09:28:51,506 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.81 vs. limit=22.5 2024-08-12 09:29:03,736 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.535e+01 2.943e+01 3.446e+01 4.689e+01, threshold=5.887e+01, percent-clipped=0.0 2024-08-12 09:29:12,102 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12100, loss[loss=0.1037, beats_loss=0.01133, ecapa_loss=0.0001723, whisper_loss=0.09061, over 19152.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01107, ecapa_loss=0.000178, whisper_loss=0.09168, over 3844607.93 frames. ], batch size: 74, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:29:15,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1570260.0, ans=0.0 2024-08-12 09:29:36,510 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2024-08-12 09:29:38,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1570460.0, ans=0.2 2024-08-12 09:29:39,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1570460.0, ans=0.1 2024-08-12 09:29:59,769 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 09:30:01,447 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 09:30:22,761 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12150, loss[loss=0.1042, beats_loss=0.01049, ecapa_loss=0.0001807, whisper_loss=0.09191, over 22871.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01105, ecapa_loss=0.0001783, whisper_loss=0.09177, over 3806800.78 frames. ], batch size: 92, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:30:33,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1570760.0, ans=0.125 2024-08-12 09:30:46,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1570860.0, ans=0.09899494936611666 2024-08-12 09:30:53,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1570960.0, ans=0.07 2024-08-12 09:31:02,052 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 09:31:05,547 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2024-08-12 09:31:25,764 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.520e+01 2.822e+01 3.048e+01 5.048e+01, threshold=5.643e+01, percent-clipped=0.0 2024-08-12 09:31:34,941 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12200, loss[loss=0.08679, beats_loss=0.01404, ecapa_loss=0.00017, whisper_loss=0.07105, over 21848.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01104, ecapa_loss=0.0001788, whisper_loss=0.09182, over 3843192.75 frames. ], batch size: 91, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:31:43,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1571260.0, ans=0.2 2024-08-12 09:31:56,500 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 09:31:58,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1571360.0, ans=0.1 2024-08-12 09:32:28,242 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 09:32:28,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1571560.0, ans=0.0 2024-08-12 09:32:37,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1571660.0, ans=0.125 2024-08-12 09:32:41,201 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 09:32:47,418 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12250, loss[loss=0.08124, beats_loss=0.01205, ecapa_loss=0.000198, whisper_loss=0.06721, over 16479.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01103, ecapa_loss=0.00018, whisper_loss=0.09227, over 3867489.12 frames. ], batch size: 67, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:32:54,163 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.40 vs. limit=10.0 2024-08-12 09:32:56,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1571760.0, ans=0.2 2024-08-12 09:32:59,284 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 38 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 09:32:59,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1571760.0, ans=0.1 2024-08-12 09:32:59,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1571760.0, ans=0.0 2024-08-12 09:33:01,901 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 09:33:38,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1572060.0, ans=0.125 2024-08-12 09:33:40,830 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 09:33:42,225 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 19 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 09:33:45,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1572160.0, ans=0.025 2024-08-12 09:33:48,042 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 09:33:51,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.601e+01 2.927e+01 3.328e+01 4.694e+01, threshold=5.855e+01, percent-clipped=0.0 2024-08-12 09:33:55,057 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 09:34:00,260 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12300, loss[loss=0.09383, beats_loss=0.01242, ecapa_loss=0.0001787, whisper_loss=0.07963, over 22553.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01109, ecapa_loss=0.0001794, whisper_loss=0.0917, over 3887531.14 frames. ], batch size: 91, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:34:29,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1572460.0, ans=0.1 2024-08-12 09:34:30,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1572460.0, ans=0.0 2024-08-12 09:34:40,326 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 09:34:50,835 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-12 09:34:52,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1572560.0, ans=0.1 2024-08-12 09:34:56,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1572660.0, ans=0.0 2024-08-12 09:35:01,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1572660.0, ans=0.0 2024-08-12 09:35:12,229 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12350, loss[loss=0.1078, beats_loss=0.01116, ecapa_loss=0.0001414, whisper_loss=0.09522, over 18820.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01106, ecapa_loss=0.0001821, whisper_loss=0.0917, over 3879967.91 frames. ], batch size: 71, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:35:22,632 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 09:35:25,361 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 09:35:36,604 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 09:35:51,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1572960.0, ans=0.1 2024-08-12 09:36:06,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1573060.0, ans=0.2 2024-08-12 09:36:11,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1573160.0, ans=0.2 2024-08-12 09:36:14,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1573160.0, ans=0.0 2024-08-12 09:36:16,692 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.617e+01 3.064e+01 3.584e+01 5.581e+01, threshold=6.128e+01, percent-clipped=0.0 2024-08-12 09:36:20,646 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.59 vs. limit=15.0 2024-08-12 09:36:20,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.11 vs. limit=15.0 2024-08-12 09:36:23,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1573160.0, ans=0.125 2024-08-12 09:36:25,407 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12400, loss[loss=0.096, beats_loss=0.01333, ecapa_loss=0.0001512, whisper_loss=0.08115, over 22909.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01107, ecapa_loss=0.0001803, whisper_loss=0.09167, over 3882394.87 frames. ], batch size: 94, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:36:25,758 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 11 from Vox, 49 fro AS 2024-08-12 09:36:26,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1573260.0, ans=0.0 2024-08-12 09:36:27,097 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-12 09:36:52,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=1573360.0, ans=0.02 2024-08-12 09:37:14,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1573560.0, ans=0.0 2024-08-12 09:37:17,810 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2024-08-12 09:37:18,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1573560.0, ans=0.0 2024-08-12 09:37:24,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1573660.0, ans=0.1 2024-08-12 09:37:28,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1573660.0, ans=0.95 2024-08-12 09:37:37,087 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 09:37:38,188 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12450, loss[loss=0.1054, beats_loss=0.008489, ecapa_loss=0.0001848, whisper_loss=0.0951, over 20249.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001799, whisper_loss=0.09152, over 3853223.68 frames. ], batch size: 81, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:38:04,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1573860.0, ans=0.125 2024-08-12 09:38:17,918 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.31 vs. limit=10.0 2024-08-12 09:38:27,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1574060.0, ans=0.125 2024-08-12 09:38:40,948 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.466e+01 2.753e+01 3.048e+01 4.353e+01, threshold=5.506e+01, percent-clipped=0.0 2024-08-12 09:38:49,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12500, loss[loss=0.09531, beats_loss=0.01376, ecapa_loss=0.0001683, whisper_loss=0.07986, over 23448.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01104, ecapa_loss=0.0001789, whisper_loss=0.09189, over 3878023.43 frames. ], batch size: 96, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:38:57,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1574260.0, ans=0.0 2024-08-12 09:39:04,026 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-12 09:39:06,731 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 09:39:09,749 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 09:39:21,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1574460.0, ans=0.0 2024-08-12 09:39:34,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1574560.0, ans=0.0 2024-08-12 09:39:50,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1574660.0, ans=0.125 2024-08-12 09:39:51,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1574660.0, ans=0.1 2024-08-12 09:39:59,007 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12550, loss[loss=0.1081, beats_loss=0.01243, ecapa_loss=0.0001806, whisper_loss=0.09381, over 21333.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01107, ecapa_loss=0.0001801, whisper_loss=0.0921, over 3878747.93 frames. ], batch size: 87, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:39:59,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1574760.0, ans=0.0 2024-08-12 09:40:13,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1574860.0, ans=0.0 2024-08-12 09:40:22,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1574860.0, ans=0.125 2024-08-12 09:40:41,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1575060.0, ans=0.0 2024-08-12 09:40:52,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1575060.0, ans=0.0 2024-08-12 09:41:01,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.516e+01 2.754e+01 3.207e+01 3.892e+01, threshold=5.508e+01, percent-clipped=0.0 2024-08-12 09:41:05,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1575160.0, ans=0.125 2024-08-12 09:41:07,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1575160.0, ans=0.0 2024-08-12 09:41:09,229 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 21 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-12 09:41:09,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1575260.0, ans=0.1 2024-08-12 09:41:10,451 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12600, loss[loss=0.08708, beats_loss=0.0122, ecapa_loss=0.0002008, whisper_loss=0.07286, over 20565.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01105, ecapa_loss=0.0001801, whisper_loss=0.09175, over 3867779.51 frames. ], batch size: 86, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:41:13,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1575260.0, ans=0.2 2024-08-12 09:41:21,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1575260.0, ans=0.0 2024-08-12 09:41:30,044 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:41:32,908 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-12 09:41:43,038 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.85 vs. limit=22.5 2024-08-12 09:41:45,376 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-12 09:41:49,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1575460.0, ans=0.1 2024-08-12 09:41:53,451 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-12 09:41:59,566 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 33 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 09:42:16,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1575660.0, ans=0.125 2024-08-12 09:42:20,503 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12650, loss[loss=0.1069, beats_loss=0.01299, ecapa_loss=0.0001638, whisper_loss=0.09226, over 21527.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01111, ecapa_loss=0.0001802, whisper_loss=0.09142, over 3852630.58 frames. ], batch size: 88, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:42:37,133 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 09:42:38,467 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 09:42:40,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1575860.0, ans=0.2 2024-08-12 09:42:41,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1575860.0, ans=0.2 2024-08-12 09:42:47,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1575960.0, ans=0.125 2024-08-12 09:43:01,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1576060.0, ans=0.0 2024-08-12 09:43:06,020 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:43:22,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.519e+01 2.747e+01 3.019e+01 4.514e+01, threshold=5.494e+01, percent-clipped=0.0 2024-08-12 09:43:25,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1576160.0, ans=0.05 2024-08-12 09:43:27,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1576160.0, ans=0.0 2024-08-12 09:43:29,726 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 09:43:30,726 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12700, loss[loss=0.1092, beats_loss=0.0104, ecapa_loss=0.0001547, whisper_loss=0.09729, over 22358.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01109, ecapa_loss=0.0001792, whisper_loss=0.09206, over 3866130.13 frames. ], batch size: 87, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:43:41,913 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 09:43:46,566 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 09:44:01,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1576460.0, ans=0.125 2024-08-12 09:44:05,695 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=12.0 2024-08-12 09:44:11,107 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:44:12,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1576560.0, ans=0.2 2024-08-12 09:44:15,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1576560.0, ans=15.0 2024-08-12 09:44:18,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1576560.0, ans=0.1 2024-08-12 09:44:20,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1576560.0, ans=0.0 2024-08-12 09:44:41,554 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12750, loss[loss=0.1091, beats_loss=0.01011, ecapa_loss=0.0001919, whisper_loss=0.09709, over 18836.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01115, ecapa_loss=0.0001794, whisper_loss=0.09185, over 3874852.78 frames. ], batch size: 75, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:44:50,361 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 25 from LS+wenet, 8 from Vox, 25 fro AS 2024-08-12 09:44:54,143 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 09:45:01,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1576860.0, ans=0.1 2024-08-12 09:45:03,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1576860.0, ans=0.1 2024-08-12 09:45:04,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1576860.0, ans=0.2 2024-08-12 09:45:14,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1576960.0, ans=0.125 2024-08-12 09:45:29,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1577060.0, ans=0.125 2024-08-12 09:45:34,411 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-08-12 09:45:41,552 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.66 vs. limit=5.0 2024-08-12 09:45:43,172 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.554e+01 2.827e+01 3.190e+01 5.112e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 09:45:45,416 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.543e+05 2024-08-12 09:45:48,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1577160.0, ans=0.2 2024-08-12 09:45:51,909 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12800, loss[loss=0.1132, beats_loss=0.01032, ecapa_loss=0.0001528, whisper_loss=0.1013, over 16935.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01113, ecapa_loss=0.0001796, whisper_loss=0.09223, over 3899065.13 frames. ], batch size: 64, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:45:55,504 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.803e+00 2024-08-12 09:46:27,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1577460.0, ans=0.0 2024-08-12 09:46:28,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1577460.0, ans=0.0 2024-08-12 09:46:51,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1577660.0, ans=0.0 2024-08-12 09:46:59,239 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.09 vs. limit=22.5 2024-08-12 09:47:02,422 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12850, loss[loss=0.09591, beats_loss=0.01263, ecapa_loss=0.0001789, whisper_loss=0.08149, over 17977.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01119, ecapa_loss=0.0001798, whisper_loss=0.09149, over 3904016.01 frames. ], batch size: 70, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:47:08,742 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=12.0 2024-08-12 09:47:18,664 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.23 vs. limit=10.0 2024-08-12 09:47:36,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1577960.0, ans=0.1 2024-08-12 09:47:47,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1578060.0, ans=0.125 2024-08-12 09:47:48,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2024-08-12 09:47:50,199 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-12 09:48:04,279 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.513e+01 2.797e+01 3.147e+01 4.860e+01, threshold=5.595e+01, percent-clipped=0.0 2024-08-12 09:48:06,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1578160.0, ans=0.1 2024-08-12 09:48:12,717 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12900, loss[loss=0.08484, beats_loss=0.01148, ecapa_loss=0.0002067, whisper_loss=0.07129, over 17862.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01111, ecapa_loss=0.0001797, whisper_loss=0.09176, over 3893115.90 frames. ], batch size: 78, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:48:16,130 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=21.38 vs. limit=15.0 2024-08-12 09:48:32,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1578360.0, ans=0.0 2024-08-12 09:48:44,892 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 09:49:05,892 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 09:49:12,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=15.0 2024-08-12 09:49:21,806 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 12950, loss[loss=0.08355, beats_loss=0.01226, ecapa_loss=0.0001741, whisper_loss=0.06954, over 16725.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0111, ecapa_loss=0.0001787, whisper_loss=0.09165, over 3936110.67 frames. ], batch size: 69, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:49:33,591 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 09:50:08,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1579060.0, ans=0.125 2024-08-12 09:50:11,799 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2024-08-12 09:50:11,949 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.88 vs. limit=15.0 2024-08-12 09:50:24,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.602e+01 2.996e+01 3.291e+01 5.195e+01, threshold=5.992e+01, percent-clipped=0.0 2024-08-12 09:50:33,644 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13000, loss[loss=0.08782, beats_loss=0.01084, ecapa_loss=0.0001951, whisper_loss=0.07503, over 14609.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01106, ecapa_loss=0.0001793, whisper_loss=0.09185, over 3941077.86 frames. ], batch size: 61, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:50:38,849 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-08-12 09:50:56,541 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 09:51:17,920 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 09:51:20,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1579560.0, ans=0.04949747468305833 2024-08-12 09:51:21,935 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 18 from LS+wenet, 34 from Vox, 32 fro AS 2024-08-12 09:51:23,277 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-12 09:51:29,193 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 09:51:34,869 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 09:51:39,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2024-08-12 09:51:40,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1579660.0, ans=0.0 2024-08-12 09:51:44,514 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13050, loss[loss=0.1078, beats_loss=0.0102, ecapa_loss=0.0001767, whisper_loss=0.09578, over 21807.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01104, ecapa_loss=0.0001779, whisper_loss=0.09222, over 3928559.96 frames. ], batch size: 89, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:51:44,668 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 09:51:44,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1579760.0, ans=0.0 2024-08-12 09:51:46,016 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 09:51:56,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1579760.0, ans=0.125 2024-08-12 09:52:07,402 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 09:52:10,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1579860.0, ans=0.125 2024-08-12 09:52:12,748 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 09:52:15,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1579960.0, ans=0.125 2024-08-12 09:52:16,206 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-12 09:52:18,794 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 09:52:26,483 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.47 vs. limit=15.0 2024-08-12 09:52:31,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1580060.0, ans=0.125 2024-08-12 09:52:39,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1580160.0, ans=0.125 2024-08-12 09:52:40,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1580160.0, ans=0.0 2024-08-12 09:52:46,606 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.449e+01 2.683e+01 3.089e+01 1.742e+02, threshold=5.367e+01, percent-clipped=1.0 2024-08-12 09:52:54,722 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13100, loss[loss=0.1183, beats_loss=0.01194, ecapa_loss=0.0001322, whisper_loss=0.1051, over 19309.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01105, ecapa_loss=0.0001769, whisper_loss=0.0924, over 3918575.01 frames. ], batch size: 73, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:53:05,749 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=12.0 2024-08-12 09:53:07,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1580260.0, ans=6.0 2024-08-12 09:53:09,117 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-12 09:53:15,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1580360.0, ans=0.0 2024-08-12 09:53:16,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1580360.0, ans=0.125 2024-08-12 09:53:18,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1580360.0, ans=0.1 2024-08-12 09:53:22,680 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.45 vs. limit=22.5 2024-08-12 09:53:23,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1580460.0, ans=0.04949747468305833 2024-08-12 09:53:26,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1580460.0, ans=0.5 2024-08-12 09:53:38,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1580560.0, ans=0.09899494936611666 2024-08-12 09:53:40,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1580560.0, ans=0.04949747468305833 2024-08-12 09:53:41,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1580560.0, ans=0.125 2024-08-12 09:53:46,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1580560.0, ans=0.2 2024-08-12 09:53:59,366 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 09:54:05,139 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13150, loss[loss=0.1075, beats_loss=0.008604, ecapa_loss=0.0002469, whisper_loss=0.09646, over 22086.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01098, ecapa_loss=0.0001774, whisper_loss=0.09344, over 3922006.67 frames. ], batch size: 94, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:54:55,884 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=15.0 2024-08-12 09:55:00,940 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 09:55:05,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1581160.0, ans=0.125 2024-08-12 09:55:07,921 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.633e+01 2.862e+01 3.411e+01 5.758e+01, threshold=5.724e+01, percent-clipped=1.0 2024-08-12 09:55:16,685 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13200, loss[loss=0.09793, beats_loss=0.01065, ecapa_loss=0.0001777, whisper_loss=0.0855, over 20779.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01096, ecapa_loss=0.0001774, whisper_loss=0.09302, over 3916925.91 frames. ], batch size: 81, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:55:18,415 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 09:55:37,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1581360.0, ans=0.125 2024-08-12 09:55:39,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.02 vs. limit=5.0 2024-08-12 09:55:39,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1581360.0, ans=10.0 2024-08-12 09:55:56,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1581460.0, ans=0.0 2024-08-12 09:56:02,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1581560.0, ans=0.0 2024-08-12 09:56:02,985 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2024-08-12 09:56:05,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1581560.0, ans=0.125 2024-08-12 09:56:27,908 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13250, loss[loss=0.1186, beats_loss=0.008252, ecapa_loss=0.0001925, whisper_loss=0.1085, over 22588.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01091, ecapa_loss=0.0001774, whisper_loss=0.09299, over 3883299.42 frames. ], batch size: 91, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:56:51,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1581860.0, ans=0.07 2024-08-12 09:57:28,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1582160.0, ans=0.125 2024-08-12 09:57:30,444 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.621e+01 2.894e+01 3.453e+01 5.158e+01, threshold=5.788e+01, percent-clipped=0.0 2024-08-12 09:57:30,797 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 09:57:34,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1582160.0, ans=0.0 2024-08-12 09:57:38,970 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13300, loss[loss=0.1101, beats_loss=0.01256, ecapa_loss=0.0001693, whisper_loss=0.09581, over 15152.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01092, ecapa_loss=0.0001774, whisper_loss=0.09278, over 3869523.14 frames. ], batch size: 63, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:58:23,763 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 09:58:25,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1582560.0, ans=0.125 2024-08-12 09:58:36,652 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 16 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 09:58:47,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1582660.0, ans=0.0 2024-08-12 09:58:49,838 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13350, loss[loss=0.1268, beats_loss=0.01081, ecapa_loss=0.0001943, whisper_loss=0.1141, over 22437.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01092, ecapa_loss=0.0001776, whisper_loss=0.09282, over 3881875.02 frames. ], batch size: 89, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:58:51,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1582760.0, ans=0.1 2024-08-12 09:58:56,984 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 09:58:58,225 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 09:58:59,633 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 09:58:59,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1582760.0, ans=0.2 2024-08-12 09:59:01,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1582760.0, ans=0.125 2024-08-12 09:59:14,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1582860.0, ans=0.125 2024-08-12 09:59:25,327 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 12 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-12 09:59:25,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1582960.0, ans=0.125 2024-08-12 09:59:25,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1582960.0, ans=0.1 2024-08-12 09:59:31,426 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 09:59:31,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1583060.0, ans=0.0 2024-08-12 09:59:36,957 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 09:59:41,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1583060.0, ans=0.125 2024-08-12 09:59:51,686 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.599e+01 2.960e+01 3.368e+01 5.094e+01, threshold=5.919e+01, percent-clipped=0.0 2024-08-12 10:00:00,227 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13400, loss[loss=0.1099, beats_loss=0.008738, ecapa_loss=0.0001453, whisper_loss=0.09968, over 18037.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01088, ecapa_loss=0.0001772, whisper_loss=0.09293, over 3874388.33 frames. ], batch size: 68, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:00:04,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1583260.0, ans=0.09899494936611666 2024-08-12 10:00:07,310 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 24 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-12 10:00:29,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1583460.0, ans=0.125 2024-08-12 10:00:37,361 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 10:00:41,523 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 10:00:41,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1583560.0, ans=0.05 2024-08-12 10:00:43,474 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-12 10:00:46,089 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.698e+01 2024-08-12 10:00:57,938 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 13 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-12 10:00:59,358 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 35 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 10:00:59,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1583660.0, ans=0.125 2024-08-12 10:01:10,068 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13450, loss[loss=0.09613, beats_loss=0.01094, ecapa_loss=0.0001687, whisper_loss=0.08351, over 16699.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01092, ecapa_loss=0.000178, whisper_loss=0.09264, over 3900015.76 frames. ], batch size: 65, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:01:10,294 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 10:01:22,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1583760.0, ans=0.1 2024-08-12 10:01:23,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1583860.0, ans=0.1 2024-08-12 10:01:34,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1583860.0, ans=0.2 2024-08-12 10:01:41,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1583960.0, ans=0.0 2024-08-12 10:02:01,509 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 30 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 10:02:11,050 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.426e+01 2.699e+01 3.096e+01 4.776e+01, threshold=5.398e+01, percent-clipped=0.0 2024-08-12 10:02:13,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1584160.0, ans=0.125 2024-08-12 10:02:19,778 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13500, loss[loss=0.09962, beats_loss=0.009531, ecapa_loss=0.0002271, whisper_loss=0.08782, over 15156.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01092, ecapa_loss=0.0001795, whisper_loss=0.09277, over 3888675.70 frames. ], batch size: 63, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:02:21,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1584260.0, ans=0.0 2024-08-12 10:02:27,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1584260.0, ans=0.125 2024-08-12 10:02:34,796 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:02:45,633 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 10:02:48,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1584460.0, ans=0.125 2024-08-12 10:03:12,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1584560.0, ans=0.1 2024-08-12 10:03:27,734 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 10:03:30,126 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13550, loss[loss=0.1112, beats_loss=0.01133, ecapa_loss=0.0001344, whisper_loss=0.09849, over 21168.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01102, ecapa_loss=0.0001778, whisper_loss=0.0921, over 3889503.23 frames. ], batch size: 80, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:04:00,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1584960.0, ans=0.125 2024-08-12 10:04:17,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1585060.0, ans=0.125 2024-08-12 10:04:26,444 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 10:04:31,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.628e+01 2.875e+01 3.352e+01 5.913e+01, threshold=5.750e+01, percent-clipped=1.0 2024-08-12 10:04:32,964 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-08-12 10:04:36,558 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 10:04:40,303 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13600, loss[loss=0.0926, beats_loss=0.01403, ecapa_loss=0.0001791, whisper_loss=0.07678, over 20833.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01102, ecapa_loss=0.0001777, whisper_loss=0.09218, over 3885689.48 frames. ], batch size: 88, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:04:56,686 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 18 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 10:05:10,570 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 31 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 10:05:16,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1585460.0, ans=0.125 2024-08-12 10:05:19,026 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 31 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 10:05:21,810 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-12 10:05:23,162 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 10:05:23,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1585560.0, ans=0.0 2024-08-12 10:05:48,734 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13650, loss[loss=0.1159, beats_loss=0.01065, ecapa_loss=0.0001854, whisper_loss=0.1034, over 22226.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01109, ecapa_loss=0.0001773, whisper_loss=0.09239, over 3902935.50 frames. ], batch size: 88, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:05:52,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1585760.0, ans=0.125 2024-08-12 10:06:03,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1585860.0, ans=0.1 2024-08-12 10:06:05,029 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.306e+01 2024-08-12 10:06:07,926 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2024-08-12 10:06:19,986 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-12 10:06:31,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1586060.0, ans=0.1 2024-08-12 10:06:50,243 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.547e+01 2.720e+01 3.156e+01 5.627e+01, threshold=5.440e+01, percent-clipped=0.0 2024-08-12 10:06:59,280 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13700, loss[loss=0.06746, beats_loss=0.01725, ecapa_loss=0.0001595, whisper_loss=0.04862, over 15788.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01111, ecapa_loss=0.0001778, whisper_loss=0.09255, over 3890919.18 frames. ], batch size: 68, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:07:16,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1586360.0, ans=0.1 2024-08-12 10:07:17,529 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-12 10:07:25,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1586460.0, ans=0.025 2024-08-12 10:07:30,938 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 10:07:36,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1586460.0, ans=0.125 2024-08-12 10:07:36,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1586460.0, ans=0.2 2024-08-12 10:07:43,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1586560.0, ans=0.0 2024-08-12 10:07:50,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1586560.0, ans=0.125 2024-08-12 10:08:02,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1586660.0, ans=0.0 2024-08-12 10:08:09,380 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13750, loss[loss=0.123, beats_loss=0.01018, ecapa_loss=0.0001985, whisper_loss=0.1109, over 22678.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01109, ecapa_loss=0.0001766, whisper_loss=0.09295, over 3895874.76 frames. ], batch size: 89, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:08:12,816 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:08:26,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1586860.0, ans=0.1 2024-08-12 10:08:55,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1587060.0, ans=0.1 2024-08-12 10:09:03,975 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.29 vs. limit=10.0 2024-08-12 10:09:11,463 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.427e+01 2.784e+01 3.131e+01 5.573e+01, threshold=5.568e+01, percent-clipped=1.0 2024-08-12 10:09:16,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1587160.0, ans=0.125 2024-08-12 10:09:20,064 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13800, loss[loss=0.09508, beats_loss=0.0114, ecapa_loss=0.0001933, whisper_loss=0.08174, over 21296.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01115, ecapa_loss=0.0001775, whisper_loss=0.09246, over 3902005.79 frames. ], batch size: 91, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:09:36,577 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 10:09:48,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1587460.0, ans=0.125 2024-08-12 10:10:03,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1587560.0, ans=0.1 2024-08-12 10:10:06,265 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=15.0 2024-08-12 10:10:23,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=1587660.0, ans=0.1 2024-08-12 10:10:32,400 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13850, loss[loss=0.06357, beats_loss=0.008748, ecapa_loss=0.0002012, whisper_loss=0.05281, over 13845.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01117, ecapa_loss=0.0001769, whisper_loss=0.09185, over 3902547.69 frames. ], batch size: 57, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:11:06,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-12 10:11:10,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1587960.0, ans=0.125 2024-08-12 10:11:13,983 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 10:11:31,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1588160.0, ans=0.0 2024-08-12 10:11:35,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.570e+01 2.844e+01 3.264e+01 2.322e+02, threshold=5.688e+01, percent-clipped=2.0 2024-08-12 10:11:44,082 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13900, loss[loss=0.07442, beats_loss=0.01548, ecapa_loss=0.0001272, whisper_loss=0.05768, over 14226.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01119, ecapa_loss=0.0001766, whisper_loss=0.09175, over 3910070.57 frames. ], batch size: 55, lr: 5.62e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:11:47,278 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 10:12:13,707 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 10:12:24,697 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 10:12:25,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1588460.0, ans=0.0 2024-08-12 10:12:31,157 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-12 10:12:34,257 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 10:12:45,328 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 10:12:47,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1588660.0, ans=0.1 2024-08-12 10:12:47,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1588660.0, ans=0.05 2024-08-12 10:13:00,949 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 13950, loss[loss=0.0706, beats_loss=0.01452, ecapa_loss=0.0001817, whisper_loss=0.05427, over 22546.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01103, ecapa_loss=0.0001777, whisper_loss=0.09275, over 3904075.06 frames. ], batch size: 97, lr: 5.62e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:13:06,659 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2024-08-12 10:13:13,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1588760.0, ans=0.125 2024-08-12 10:13:13,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1588760.0, ans=0.2 2024-08-12 10:13:52,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1589060.0, ans=0.125 2024-08-12 10:14:14,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.447e+01 2.683e+01 3.149e+01 1.029e+02, threshold=5.366e+01, percent-clipped=1.0 2024-08-12 10:14:15,406 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=22.5 2024-08-12 10:14:20,782 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 10:14:23,665 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 14000, loss[loss=0.1333, beats_loss=0.009352, ecapa_loss=0.0001448, whisper_loss=0.1225, over 20305.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01099, ecapa_loss=0.0001769, whisper_loss=0.09328, over 3885767.14 frames. ], batch size: 73, lr: 5.62e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:14:39,475 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 10:14:41,437 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 10:14:53,845 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 10:15:11,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.62 vs. limit=22.5 2024-08-12 10:15:34,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1589660.0, ans=0.125 2024-08-12 10:15:35,726 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 10:15:41,946 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 14050, loss[loss=0.1292, beats_loss=0.008717, ecapa_loss=0.0001575, whisper_loss=0.1189, over 16000.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01094, ecapa_loss=0.0001779, whisper_loss=0.09348, over 3845141.73 frames. ], batch size: 59, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:15:52,830 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.03 vs. limit=15.0 2024-08-12 10:16:16,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1589960.0, ans=0.125 2024-08-12 10:16:28,799 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-08-12 10:16:31,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1590060.0, ans=0.0 2024-08-12 10:16:36,652 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 33 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 10:16:44,902 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 10:16:54,257 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.562e+01 2.972e+01 3.503e+01 4.652e+01, threshold=5.944e+01, percent-clipped=0.0 2024-08-12 10:16:56,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1590160.0, ans=6.0 2024-08-12 10:17:03,688 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 14100, loss[loss=0.1011, beats_loss=0.01065, ecapa_loss=0.0002054, whisper_loss=0.08841, over 19731.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01092, ecapa_loss=0.0001777, whisper_loss=0.09383, over 3842818.18 frames. ], batch size: 79, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:17:20,016 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 14 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 10:17:23,098 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 10:17:31,793 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 10:17:38,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1590460.0, ans=0.04949747468305833 2024-08-12 10:17:41,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1590460.0, ans=10.0 2024-08-12 10:17:51,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1590560.0, ans=0.1 2024-08-12 10:17:56,621 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 10:18:08,490 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 20 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-12 10:18:15,976 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 10:18:23,568 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 14150, loss[loss=0.1164, beats_loss=0.01016, ecapa_loss=0.0001803, whisper_loss=0.1044, over 18210.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0111, ecapa_loss=0.0001765, whisper_loss=0.09304, over 3874581.06 frames. ], batch size: 70, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:18:28,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1590760.0, ans=0.125 2024-08-12 10:18:32,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1590760.0, ans=0.125 2024-08-12 10:19:06,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1590960.0, ans=0.125 2024-08-12 10:19:10,081 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 26 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 10:19:17,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1591060.0, ans=0.125 2024-08-12 10:19:20,209 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 30 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 10:19:25,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1591060.0, ans=0.125 2024-08-12 10:19:39,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.532e+01 2.801e+01 3.352e+01 7.282e+01, threshold=5.601e+01, percent-clipped=1.0 2024-08-12 10:19:49,230 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 14200, loss[loss=0.1068, beats_loss=0.01229, ecapa_loss=0.000153, whisper_loss=0.09293, over 21619.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01102, ecapa_loss=0.0001763, whisper_loss=0.09265, over 3843632.32 frames. ], batch size: 87, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:20:00,067 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-12 10:20:00,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1591260.0, ans=0.5 2024-08-12 10:20:01,980 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-12 10:20:08,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1591360.0, ans=0.0 2024-08-12 10:20:25,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1591460.0, ans=0.125 2024-08-12 10:21:11,669 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 14250, loss[loss=0.1045, beats_loss=0.01039, ecapa_loss=0.0001625, whisper_loss=0.0925, over 18454.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01108, ecapa_loss=0.0001752, whisper_loss=0.09217, over 3852025.47 frames. ], batch size: 73, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:21:32,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1591860.0, ans=0.125 2024-08-12 10:21:46,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1591960.0, ans=0.1 2024-08-12 10:21:57,970 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 10:21:59,730 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 10:22:16,012 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-12 10:22:21,093 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 10:22:22,523 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.447e+01 2.773e+01 3.183e+01 5.230e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-12 10:22:28,207 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-12 10:22:28,840 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2024-08-12 10:22:33,116 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 14300, loss[loss=0.101, beats_loss=0.01243, ecapa_loss=0.0001745, whisper_loss=0.08685, over 22324.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01106, ecapa_loss=0.0001746, whisper_loss=0.09282, over 3897426.10 frames. ], batch size: 90, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:22:41,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1592260.0, ans=0.2 2024-08-12 10:22:49,729 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.46 vs. limit=10.0 2024-08-12 10:23:06,073 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 10 from LS+wenet, 10 from Vox, 37 fro AS 2024-08-12 10:23:06,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1592460.0, ans=0.07 2024-08-12 10:23:28,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1592560.0, ans=0.125 2024-08-12 10:23:31,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1592560.0, ans=0.125 2024-08-12 10:23:51,569 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-12 10:23:55,901 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 14350, loss[loss=0.1198, beats_loss=0.00948, ecapa_loss=0.0001815, whisper_loss=0.1085, over 23874.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01097, ecapa_loss=0.0001753, whisper_loss=0.09279, over 3904752.90 frames. ], batch size: 90, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:24:00,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.56 vs. limit=15.0 2024-08-12 10:24:04,266 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-12 10:24:13,447 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-12 10:24:18,948 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 10:24:30,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1592960.0, ans=0.125 2024-08-12 10:24:41,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1593060.0, ans=0.125 2024-08-12 10:25:00,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.480e+01 2.799e+01 3.080e+01 4.714e+01, threshold=5.598e+01, percent-clipped=0.0 2024-08-12 10:25:04,745 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:25:08,495 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 14400, loss[loss=0.1204, beats_loss=0.01045, ecapa_loss=0.0001675, whisper_loss=0.1083, over 23709.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01094, ecapa_loss=0.0001773, whisper_loss=0.09316, over 3908751.79 frames. ], batch size: 92, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:25:20,005 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2024-08-12 10:25:32,182 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 27 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-12 10:25:36,809 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 10:26:01,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1593560.0, ans=0.025 2024-08-12 10:26:03,159 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 10:26:06,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1593660.0, ans=0.0 2024-08-12 10:26:13,183 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 10:26:21,831 INFO [train_multi_KD3.py:1116] (1/4) Epoch 11, batch 14450, loss[loss=0.1061, beats_loss=0.0103, ecapa_loss=0.0001942, whisper_loss=0.09383, over 18429.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01107, ecapa_loss=0.0001769, whisper_loss=0.09225, over 3890882.69 frames. ], batch size: 71, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:26:25,427 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-08-12 10:26:29,106 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 16 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 10:26:33,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1593760.0, ans=0.07 2024-08-12 10:26:33,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1593760.0, ans=0.125 2024-08-12 10:26:51,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1593960.0, ans=0.0 2024-08-12 10:27:00,930 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 10:27:48,666 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 0, loss[loss=0.1164, beats_loss=0.009559, ecapa_loss=0.0002147, whisper_loss=0.1047, over 16292.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.009559, ecapa_loss=0.0002147, whisper_loss=0.1047, over 16292.00 frames. ], batch size: 62, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:27:48,666 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 10:28:26,854 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on ASR_libri: loss=0.2553, beats_loss=0, ecapa_loss=0.0005949, whisper_loss=0.2493, over 922467.00 frames. 2024-08-12 10:28:43,286 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on SV_voxceleb1: loss=0.004912, beats_loss=0, ecapa_loss=0.0004912, whisper_loss=0, over 939242.00 frames. 2024-08-12 10:30:40,434 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on AT_audioset: loss=0.02433, beats_loss=0.02433, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 10:30:40,438 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 10:30:40,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1594110.0, ans=0.125 2024-08-12 10:30:50,387 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 10:30:59,949 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.491e+01 2.893e+01 3.197e+01 9.364e+01, threshold=5.786e+01, percent-clipped=1.0 2024-08-12 10:31:17,745 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.898e-01 2024-08-12 10:31:54,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1594410.0, ans=0.125 2024-08-12 10:32:04,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1594510.0, ans=0.125 2024-08-12 10:32:14,442 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 10:32:22,406 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 10:32:24,840 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 50, loss[loss=0.08765, beats_loss=0.009909, ecapa_loss=0.0002008, whisper_loss=0.07573, over 15714.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01021, ecapa_loss=0.0001869, whisper_loss=0.09193, over 871005.44 frames. ], batch size: 61, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:32:28,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1594610.0, ans=0.125 2024-08-12 10:32:34,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1594610.0, ans=0.0 2024-08-12 10:32:44,177 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 10:32:54,618 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 24 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-12 10:33:22,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1594810.0, ans=0.125 2024-08-12 10:33:22,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1594810.0, ans=0.0 2024-08-12 10:33:33,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1594910.0, ans=0.05 2024-08-12 10:33:45,424 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 10:34:02,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1595010.0, ans=0.0 2024-08-12 10:34:13,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1595110.0, ans=0.125 2024-08-12 10:34:14,937 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 100, loss[loss=0.1149, beats_loss=0.009236, ecapa_loss=0.0002113, whisper_loss=0.1036, over 16426.00 frames. ], tot_loss[loss=0.104, beats_loss=0.009983, ecapa_loss=0.0001819, whisper_loss=0.09224, over 1523076.42 frames. ], batch size: 65, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:34:21,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1595110.0, ans=0.0 2024-08-12 10:34:28,713 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 10:34:34,591 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+01 2.774e+01 3.018e+01 3.442e+01 6.372e+01, threshold=6.036e+01, percent-clipped=2.0 2024-08-12 10:35:11,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1595310.0, ans=0.125 2024-08-12 10:35:48,242 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-12 10:35:50,798 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.47 vs. limit=15.0 2024-08-12 10:36:03,700 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 150, loss[loss=0.08169, beats_loss=0.01068, ecapa_loss=0.0002229, whisper_loss=0.06878, over 17851.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01018, ecapa_loss=0.0001807, whisper_loss=0.09143, over 2040273.38 frames. ], batch size: 76, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:36:10,777 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-12 10:36:26,888 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-12 10:36:35,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1595710.0, ans=0.125 2024-08-12 10:36:40,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1595810.0, ans=0.125 2024-08-12 10:36:48,595 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 36 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-12 10:37:22,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1596010.0, ans=0.125 2024-08-12 10:37:31,893 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 200, loss[loss=0.08415, beats_loss=0.01271, ecapa_loss=0.000162, whisper_loss=0.06981, over 17464.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01031, ecapa_loss=0.0001798, whisper_loss=0.09127, over 2434284.45 frames. ], batch size: 73, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:37:37,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1596110.0, ans=0.0 2024-08-12 10:37:49,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.723e+01 2.993e+01 3.587e+01 5.466e+01, threshold=5.985e+01, percent-clipped=0.0 2024-08-12 10:38:24,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1596310.0, ans=0.2 2024-08-12 10:38:37,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1596410.0, ans=0.1 2024-08-12 10:38:42,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.08 vs. limit=22.5 2024-08-12 10:38:59,236 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 250, loss[loss=0.08742, beats_loss=0.01108, ecapa_loss=0.0001639, whisper_loss=0.07471, over 16022.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0103, ecapa_loss=0.0001801, whisper_loss=0.09269, over 2750466.54 frames. ], batch size: 62, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:39:16,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1596710.0, ans=0.1 2024-08-12 10:39:26,003 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-12 10:39:37,962 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 10:39:38,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1596810.0, ans=0.125 2024-08-12 10:39:55,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1596910.0, ans=0.1 2024-08-12 10:39:56,623 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 10:39:58,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2024-08-12 10:40:08,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=15.0 2024-08-12 10:40:19,694 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 300, loss[loss=0.08283, beats_loss=0.01255, ecapa_loss=0.0001462, whisper_loss=0.06882, over 16802.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01044, ecapa_loss=0.0001814, whisper_loss=0.09328, over 2977052.68 frames. ], batch size: 67, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:40:23,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1597110.0, ans=0.125 2024-08-12 10:40:34,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.533e+01 2.859e+01 3.181e+01 4.204e+01, threshold=5.718e+01, percent-clipped=0.0 2024-08-12 10:40:37,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1597210.0, ans=0.2 2024-08-12 10:40:44,412 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-12 10:40:55,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1597310.0, ans=0.125 2024-08-12 10:40:56,591 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 10:41:07,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1597410.0, ans=0.0 2024-08-12 10:41:21,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1597410.0, ans=0.0 2024-08-12 10:41:28,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1597510.0, ans=0.125 2024-08-12 10:41:37,998 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.30 vs. limit=15.0 2024-08-12 10:41:39,827 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 350, loss[loss=0.1095, beats_loss=0.01078, ecapa_loss=0.0001472, whisper_loss=0.09727, over 22625.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01046, ecapa_loss=0.0001795, whisper_loss=0.09375, over 3193428.96 frames. ], batch size: 89, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:42:26,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1597910.0, ans=0.1 2024-08-12 10:42:40,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1598010.0, ans=0.05 2024-08-12 10:42:56,017 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 10:42:57,162 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 400, loss[loss=0.1018, beats_loss=0.008851, ecapa_loss=0.0002047, whisper_loss=0.0909, over 15563.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01058, ecapa_loss=0.0001786, whisper_loss=0.09255, over 3317867.27 frames. ], batch size: 62, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:43:11,802 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.512e+01 2.716e+01 3.145e+01 4.909e+01, threshold=5.433e+01, percent-clipped=0.0 2024-08-12 10:43:37,532 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 16 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 10:43:59,313 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.15 vs. limit=22.5 2024-08-12 10:44:07,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1598510.0, ans=0.09899494936611666 2024-08-12 10:44:09,826 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-12 10:44:15,415 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 450, loss[loss=0.1014, beats_loss=0.01032, ecapa_loss=0.0002029, whisper_loss=0.08908, over 17968.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01066, ecapa_loss=0.0001795, whisper_loss=0.09228, over 3418427.82 frames. ], batch size: 73, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:44:34,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1598710.0, ans=0.0 2024-08-12 10:44:35,351 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-12 10:44:49,669 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 10:45:06,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1598910.0, ans=0.2 2024-08-12 10:45:25,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1599010.0, ans=15.0 2024-08-12 10:45:32,816 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 500, loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.0001936, whisper_loss=0.09067, over 21426.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001789, whisper_loss=0.09163, over 3524251.27 frames. ], batch size: 86, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:45:33,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1599110.0, ans=0.0 2024-08-12 10:45:34,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1599110.0, ans=0.1 2024-08-12 10:45:34,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1599110.0, ans=0.0 2024-08-12 10:45:44,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1599110.0, ans=0.125 2024-08-12 10:45:45,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1599110.0, ans=0.0 2024-08-12 10:45:46,833 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.446e+01 2.825e+01 3.305e+01 5.621e+01, threshold=5.651e+01, percent-clipped=2.0 2024-08-12 10:45:56,761 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.936e-01 2024-08-12 10:45:59,956 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 10:46:06,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1599310.0, ans=0.125 2024-08-12 10:46:15,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1599310.0, ans=0.125 2024-08-12 10:46:20,084 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 10:46:31,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1599410.0, ans=0.125 2024-08-12 10:46:35,903 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2024-08-12 10:46:38,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1599510.0, ans=0.125 2024-08-12 10:46:52,767 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 550, loss[loss=0.1026, beats_loss=0.01214, ecapa_loss=0.0001367, whisper_loss=0.08909, over 19707.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01072, ecapa_loss=0.0001781, whisper_loss=0.09179, over 3617813.17 frames. ], batch size: 75, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:46:55,213 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 10:47:07,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1599610.0, ans=10.0 2024-08-12 10:47:43,305 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 10:48:13,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1600110.0, ans=0.125 2024-08-12 10:48:14,683 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 600, loss[loss=0.09499, beats_loss=0.01246, ecapa_loss=0.000163, whisper_loss=0.0809, over 22447.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01076, ecapa_loss=0.000176, whisper_loss=0.09191, over 3660535.64 frames. ], batch size: 89, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:48:17,061 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:48:22,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1600110.0, ans=0.125 2024-08-12 10:48:24,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1600110.0, ans=0.125 2024-08-12 10:48:28,524 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.536e+01 2.795e+01 3.405e+01 6.348e+01, threshold=5.590e+01, percent-clipped=1.0 2024-08-12 10:48:37,711 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 10:48:46,002 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=15.0 2024-08-12 10:48:48,367 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 10:48:50,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1600310.0, ans=0.2 2024-08-12 10:48:56,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1600310.0, ans=0.2 2024-08-12 10:49:19,673 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 10:49:19,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1600510.0, ans=0.0 2024-08-12 10:49:21,019 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 10:49:31,236 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 650, loss[loss=0.1041, beats_loss=0.009555, ecapa_loss=0.0001661, whisper_loss=0.09286, over 19109.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01085, ecapa_loss=0.0001744, whisper_loss=0.09131, over 3687494.29 frames. ], batch size: 73, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:49:55,590 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 10:50:02,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1600710.0, ans=0.0 2024-08-12 10:50:08,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1600810.0, ans=0.0 2024-08-12 10:50:41,053 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 10:50:44,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1601010.0, ans=0.125 2024-08-12 10:50:44,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1601010.0, ans=0.125 2024-08-12 10:50:52,218 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 700, loss[loss=0.09746, beats_loss=0.01112, ecapa_loss=0.0002037, whisper_loss=0.08431, over 20428.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.000175, whisper_loss=0.09124, over 3701007.55 frames. ], batch size: 85, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:50:56,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1601110.0, ans=0.07 2024-08-12 10:50:57,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1601110.0, ans=0.025 2024-08-12 10:51:06,157 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.429e+01 2.647e+01 2.906e+01 4.054e+01, threshold=5.293e+01, percent-clipped=0.0 2024-08-12 10:51:06,428 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 18 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-12 10:51:38,339 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=12.0 2024-08-12 10:51:46,493 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.65 vs. limit=10.0 2024-08-12 10:51:47,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1601410.0, ans=0.125 2024-08-12 10:51:54,289 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.69 vs. limit=10.0 2024-08-12 10:51:58,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1601510.0, ans=0.0 2024-08-12 10:52:03,239 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2024-08-12 10:52:06,011 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 27 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 10:52:10,691 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 750, loss[loss=0.1091, beats_loss=0.01003, ecapa_loss=0.0001473, whisper_loss=0.09762, over 17743.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01094, ecapa_loss=0.0001736, whisper_loss=0.09119, over 3735843.63 frames. ], batch size: 62, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:52:15,542 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 10:52:15,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1601610.0, ans=0.125 2024-08-12 10:52:17,194 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-12 10:52:19,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1601610.0, ans=0.125 2024-08-12 10:52:29,170 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-08-12 10:52:43,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1601810.0, ans=0.04949747468305833 2024-08-12 10:52:44,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1601810.0, ans=0.1 2024-08-12 10:53:10,973 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 10:53:15,632 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 10:53:29,896 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 800, loss[loss=0.1062, beats_loss=0.01074, ecapa_loss=0.0001437, whisper_loss=0.09399, over 19161.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01085, ecapa_loss=0.0001734, whisper_loss=0.09116, over 3755058.94 frames. ], batch size: 73, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:53:45,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.463e+01 2.797e+01 3.235e+01 6.542e+01, threshold=5.594e+01, percent-clipped=1.0 2024-08-12 10:53:48,924 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 10:53:52,199 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:53:56,550 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 10:54:08,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1602310.0, ans=0.1 2024-08-12 10:54:38,549 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=11.12 vs. limit=12.0 2024-08-12 10:54:44,257 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 10:54:50,066 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 850, loss[loss=0.1109, beats_loss=0.01187, ecapa_loss=0.0001555, whisper_loss=0.09743, over 21952.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.0001725, whisper_loss=0.09112, over 3790650.64 frames. ], batch size: 90, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:54:53,743 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 10:55:03,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1602610.0, ans=0.125 2024-08-12 10:55:08,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1602710.0, ans=0.125 2024-08-12 10:55:09,403 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 10:55:13,550 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=15.0 2024-08-12 10:55:17,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1602710.0, ans=0.2 2024-08-12 10:55:23,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1602810.0, ans=0.125 2024-08-12 10:55:26,660 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 10:55:34,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.40 vs. limit=15.0 2024-08-12 10:55:49,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1602910.0, ans=0.0 2024-08-12 10:55:54,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=1603010.0, ans=0.02 2024-08-12 10:56:06,449 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-12 10:56:09,041 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 900, loss[loss=0.1126, beats_loss=0.009199, ecapa_loss=0.0001647, whisper_loss=0.1018, over 18944.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01092, ecapa_loss=0.0001712, whisper_loss=0.09075, over 3802833.57 frames. ], batch size: 73, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:56:14,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1603110.0, ans=0.2 2024-08-12 10:56:18,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=1603110.0, ans=0.02 2024-08-12 10:56:27,191 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=12.0 2024-08-12 10:56:29,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.446e+01 2.685e+01 3.025e+01 4.659e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-12 10:56:37,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1603210.0, ans=0.125 2024-08-12 10:56:41,603 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2024-08-12 10:56:45,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1603210.0, ans=0.125 2024-08-12 10:56:47,070 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-12 10:56:56,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1603310.0, ans=0.125 2024-08-12 10:57:08,696 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-12 10:57:33,415 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 950, loss[loss=0.08842, beats_loss=0.01417, ecapa_loss=0.0001244, whisper_loss=0.07301, over 17014.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01098, ecapa_loss=0.0001698, whisper_loss=0.09027, over 3814562.65 frames. ], batch size: 67, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:57:39,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1603610.0, ans=0.0 2024-08-12 10:57:58,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1603710.0, ans=0.05 2024-08-12 10:58:12,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1603810.0, ans=0.1 2024-08-12 10:58:15,381 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 10:58:21,028 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 10:58:33,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1603910.0, ans=0.0 2024-08-12 10:58:43,335 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 10:58:50,873 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 10:59:03,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1604010.0, ans=0.0 2024-08-12 10:59:04,145 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 36 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 10:59:06,742 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 35 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 10:59:10,602 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1000, loss[loss=0.1138, beats_loss=0.01058, ecapa_loss=0.0001392, whisper_loss=0.1019, over 22379.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01093, ecapa_loss=0.0001691, whisper_loss=0.09027, over 3810001.93 frames. ], batch size: 84, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:59:15,475 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 28 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-12 10:59:32,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.574e+01 2.849e+01 3.275e+01 5.377e+01, threshold=5.697e+01, percent-clipped=1.0 2024-08-12 10:59:35,001 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 10:59:52,562 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 10:59:58,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1604310.0, ans=0.0 2024-08-12 11:00:01,249 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.78 vs. limit=10.0 2024-08-12 11:00:17,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1604410.0, ans=0.125 2024-08-12 11:00:23,570 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 11:00:24,132 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-08-12 11:00:28,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1604410.0, ans=0.09899494936611666 2024-08-12 11:00:28,484 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=15.0 2024-08-12 11:00:49,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1604510.0, ans=0.125 2024-08-12 11:00:58,814 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1050, loss[loss=0.09608, beats_loss=0.01249, ecapa_loss=0.0001413, whisper_loss=0.08218, over 17931.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01094, ecapa_loss=0.0001684, whisper_loss=0.09129, over 3822313.25 frames. ], batch size: 71, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:01:15,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1604610.0, ans=0.1 2024-08-12 11:01:36,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1604710.0, ans=0.0 2024-08-12 11:01:46,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1604710.0, ans=0.125 2024-08-12 11:02:02,351 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-12 11:02:32,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1604910.0, ans=0.125 2024-08-12 11:02:37,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1605010.0, ans=0.0 2024-08-12 11:02:40,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1605010.0, ans=0.0 2024-08-12 11:02:45,270 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 11:02:48,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1605010.0, ans=0.2 2024-08-12 11:02:51,034 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=15.0 2024-08-12 11:02:59,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1605110.0, ans=0.125 2024-08-12 11:03:01,271 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1100, loss[loss=0.1109, beats_loss=0.009009, ecapa_loss=0.0001986, whisper_loss=0.09993, over 22198.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0108, ecapa_loss=0.0001707, whisper_loss=0.09234, over 3831257.10 frames. ], batch size: 91, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:03:15,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1605110.0, ans=0.125 2024-08-12 11:03:27,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.546e+01 2.827e+01 3.274e+01 5.638e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 11:03:35,502 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 25 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 11:03:40,893 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 11:03:43,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1605210.0, ans=0.125 2024-08-12 11:04:17,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1605410.0, ans=0.125 2024-08-12 11:04:59,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1605510.0, ans=0.0 2024-08-12 11:05:09,782 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1150, loss[loss=0.1232, beats_loss=0.00973, ecapa_loss=0.0001581, whisper_loss=0.1119, over 20167.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01081, ecapa_loss=0.0001702, whisper_loss=0.09216, over 3808815.52 frames. ], batch size: 79, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:05:19,378 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 11:05:19,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1605610.0, ans=0.125 2024-08-12 11:05:19,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1605610.0, ans=0.0 2024-08-12 11:05:27,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1605610.0, ans=0.125 2024-08-12 11:05:47,880 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-12 11:05:54,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1605710.0, ans=0.125 2024-08-12 11:06:17,684 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-08-12 11:06:17,707 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.40 vs. limit=10.0 2024-08-12 11:06:19,140 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 11:06:21,671 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-12 11:06:23,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1605810.0, ans=0.125 2024-08-12 11:06:29,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1605910.0, ans=0.125 2024-08-12 11:06:31,916 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2024-08-12 11:06:32,859 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 11:06:37,028 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-12 11:06:44,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1605910.0, ans=0.125 2024-08-12 11:07:07,828 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 11:07:10,849 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2024-08-12 11:07:14,633 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1200, loss[loss=0.1002, beats_loss=0.01239, ecapa_loss=0.0001651, whisper_loss=0.08615, over 22129.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.0001694, whisper_loss=0.09151, over 3825512.58 frames. ], batch size: 89, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:07:16,838 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 11:07:36,823 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.362e+01 2.610e+01 2.988e+01 4.824e+01, threshold=5.220e+01, percent-clipped=0.0 2024-08-12 11:07:56,719 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-12 11:08:02,142 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.52 vs. limit=10.0 2024-08-12 11:08:51,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1606510.0, ans=0.05 2024-08-12 11:09:01,754 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 15 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 11:09:03,121 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1250, loss[loss=0.07941, beats_loss=0.01328, ecapa_loss=0.0001633, whisper_loss=0.0645, over 16334.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01104, ecapa_loss=0.0001691, whisper_loss=0.09083, over 3830800.86 frames. ], batch size: 66, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:09:09,637 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 11:09:33,667 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-12 11:10:01,650 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.59 vs. limit=10.0 2024-08-12 11:10:04,774 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=12.0 2024-08-12 11:10:12,976 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 11:10:17,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2024-08-12 11:10:22,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1607010.0, ans=0.125 2024-08-12 11:10:28,151 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1300, loss[loss=0.1079, beats_loss=0.01052, ecapa_loss=0.0001563, whisper_loss=0.09582, over 15087.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01098, ecapa_loss=0.00017, whisper_loss=0.0912, over 3828754.54 frames. ], batch size: 55, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:10:39,379 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.20 vs. limit=15.0 2024-08-12 11:10:44,493 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.467e+01 2.705e+01 3.116e+01 5.074e+01, threshold=5.411e+01, percent-clipped=0.0 2024-08-12 11:10:50,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1607210.0, ans=0.0 2024-08-12 11:11:00,369 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.28 vs. limit=12.0 2024-08-12 11:11:05,826 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2024-08-12 11:11:24,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1607410.0, ans=0.1 2024-08-12 11:11:40,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1607510.0, ans=0.125 2024-08-12 11:11:41,920 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 11:11:49,346 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1350, loss[loss=0.1011, beats_loss=0.01428, ecapa_loss=0.0001155, whisper_loss=0.08562, over 19071.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0109, ecapa_loss=0.0001703, whisper_loss=0.09162, over 3812525.98 frames. ], batch size: 72, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:12:13,812 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.70 vs. limit=10.0 2024-08-12 11:12:48,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1607910.0, ans=0.1 2024-08-12 11:12:58,600 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 11:13:06,301 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 11:13:08,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1608010.0, ans=0.0 2024-08-12 11:13:10,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1608110.0, ans=0.2 2024-08-12 11:13:11,496 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1400, loss[loss=0.1027, beats_loss=0.01103, ecapa_loss=0.0001511, whisper_loss=0.09012, over 16984.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01077, ecapa_loss=0.0001694, whisper_loss=0.09204, over 3806098.70 frames. ], batch size: 68, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:13:13,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1608110.0, ans=0.0 2024-08-12 11:13:17,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1608110.0, ans=0.0 2024-08-12 11:13:27,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.408e+01 2.816e+01 3.296e+01 5.087e+01, threshold=5.632e+01, percent-clipped=0.0 2024-08-12 11:13:34,804 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 10 from Vox, 38 fro AS 2024-08-12 11:13:38,058 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 11:13:39,429 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 31 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 11:13:51,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1608310.0, ans=0.0 2024-08-12 11:13:54,369 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 11:13:57,747 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 11:13:58,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1608310.0, ans=0.125 2024-08-12 11:14:10,600 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.653e+05 2024-08-12 11:14:12,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1608410.0, ans=0.125 2024-08-12 11:14:26,783 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 11:14:28,556 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 11:14:30,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1608510.0, ans=0.2 2024-08-12 11:14:34,812 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1450, loss[loss=0.113, beats_loss=0.008658, ecapa_loss=0.0001561, whisper_loss=0.1027, over 15075.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01071, ecapa_loss=0.0001694, whisper_loss=0.09245, over 3780818.70 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:15:03,680 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 11:15:14,329 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2024-08-12 11:15:34,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1608810.0, ans=0.2 2024-08-12 11:15:35,878 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 11:15:49,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1608910.0, ans=0.0 2024-08-12 11:15:51,098 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2024-08-12 11:16:09,010 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-12 11:16:09,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1609010.0, ans=0.0 2024-08-12 11:16:22,119 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1500, loss[loss=0.1248, beats_loss=0.007577, ecapa_loss=0.0001817, whisper_loss=0.1154, over 16837.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01075, ecapa_loss=0.00017, whisper_loss=0.09159, over 3757587.20 frames. ], batch size: 64, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:16:25,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1609110.0, ans=0.2 2024-08-12 11:16:33,386 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 11:16:38,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.429e+01 2.735e+01 3.054e+01 5.898e+01, threshold=5.470e+01, percent-clipped=1.0 2024-08-12 11:17:02,715 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.702e+02 2024-08-12 11:17:07,983 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-12 11:17:16,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=1609410.0, ans=0.2 2024-08-12 11:17:17,935 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 11:17:28,734 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 23 from LS+wenet, 8 from Vox, 24 fro AS 2024-08-12 11:17:52,694 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1550, loss[loss=0.08885, beats_loss=0.01164, ecapa_loss=0.0001611, whisper_loss=0.0756, over 15086.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01082, ecapa_loss=0.0001696, whisper_loss=0.09157, over 3756449.09 frames. ], batch size: 61, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:17:56,524 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 36 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 11:18:13,928 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-12 11:18:17,450 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 32 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 11:18:22,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1609710.0, ans=0.0 2024-08-12 11:18:35,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1609810.0, ans=0.2 2024-08-12 11:18:38,237 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 11:18:47,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1609910.0, ans=0.125 2024-08-12 11:19:04,392 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 12 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 11:19:15,372 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-08-12 11:19:19,292 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1600, loss[loss=0.1177, beats_loss=0.009446, ecapa_loss=0.0002015, whisper_loss=0.1062, over 17465.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01081, ecapa_loss=0.0001682, whisper_loss=0.09203, over 3791395.09 frames. ], batch size: 68, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:19:27,164 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 11:19:36,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.594e+01 2.878e+01 3.251e+01 6.117e+01, threshold=5.756e+01, percent-clipped=2.0 2024-08-12 11:19:44,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1610210.0, ans=0.0 2024-08-12 11:19:46,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1610210.0, ans=0.125 2024-08-12 11:19:52,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1610210.0, ans=0.0 2024-08-12 11:20:25,370 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-12 11:20:27,267 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 11:20:41,612 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 11:20:45,796 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1650, loss[loss=0.1144, beats_loss=0.009566, ecapa_loss=0.0001817, whisper_loss=0.103, over 23087.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01077, ecapa_loss=0.0001695, whisper_loss=0.09213, over 3789164.23 frames. ], batch size: 92, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:20:59,756 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 11:21:19,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1610810.0, ans=0.125 2024-08-12 11:21:30,929 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 11:21:37,963 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.27 vs. limit=22.5 2024-08-12 11:21:49,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1610910.0, ans=0.1 2024-08-12 11:22:07,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1611110.0, ans=0.0 2024-08-12 11:22:07,534 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 11:22:08,523 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1700, loss[loss=0.09574, beats_loss=0.01306, ecapa_loss=0.0001883, whisper_loss=0.0808, over 14069.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0108, ecapa_loss=0.0001684, whisper_loss=0.09195, over 3763499.52 frames. ], batch size: 60, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:22:08,628 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-12 11:22:11,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-12 11:22:15,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1611110.0, ans=0.125 2024-08-12 11:22:16,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1611110.0, ans=0.05 2024-08-12 11:22:23,084 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 11:22:24,322 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.487e+01 2.798e+01 3.265e+01 1.299e+02, threshold=5.596e+01, percent-clipped=2.0 2024-08-12 11:22:30,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1611210.0, ans=0.0 2024-08-12 11:22:44,854 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2024-08-12 11:22:45,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1611310.0, ans=0.0 2024-08-12 11:23:13,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1611510.0, ans=0.2 2024-08-12 11:23:29,984 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1750, loss[loss=0.1041, beats_loss=0.00912, ecapa_loss=0.0001411, whisper_loss=0.09362, over 16512.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01078, ecapa_loss=0.0001685, whisper_loss=0.09186, over 3747195.89 frames. ], batch size: 59, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:23:31,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1611610.0, ans=0.0 2024-08-12 11:23:39,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1611610.0, ans=0.125 2024-08-12 11:23:53,934 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 15 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 11:24:02,354 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-12 11:24:05,823 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.77 vs. limit=10.0 2024-08-12 11:24:22,707 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 11:24:25,697 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 11:24:49,894 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1800, loss[loss=0.103, beats_loss=0.01241, ecapa_loss=0.0001554, whisper_loss=0.08903, over 15275.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01075, ecapa_loss=0.0001694, whisper_loss=0.09183, over 3741318.11 frames. ], batch size: 61, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:24:52,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1612110.0, ans=0.07 2024-08-12 11:24:55,543 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.65 vs. limit=5.0 2024-08-12 11:25:05,867 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.474e+01 2.742e+01 2.995e+01 4.904e+01, threshold=5.483e+01, percent-clipped=0.0 2024-08-12 11:25:07,638 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 11:25:44,765 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-12 11:25:57,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1612510.0, ans=0.125 2024-08-12 11:26:14,763 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1850, loss[loss=0.1059, beats_loss=0.01136, ecapa_loss=0.0002101, whisper_loss=0.09247, over 18373.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.0001698, whisper_loss=0.09128, over 3740322.21 frames. ], batch size: 78, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:26:51,081 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.79 vs. limit=22.5 2024-08-12 11:26:52,107 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 11:26:56,890 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-12 11:27:09,093 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-12 11:27:20,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1612910.0, ans=0.0 2024-08-12 11:27:27,542 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 11:27:40,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1613010.0, ans=22.5 2024-08-12 11:27:42,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1613010.0, ans=0.125 2024-08-12 11:27:44,642 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 11:28:04,046 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1900, loss[loss=0.1069, beats_loss=0.01013, ecapa_loss=0.0001536, whisper_loss=0.09524, over 15487.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01084, ecapa_loss=0.0001699, whisper_loss=0.09149, over 3746494.47 frames. ], batch size: 62, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:28:08,860 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 23 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-12 11:28:15,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1613110.0, ans=0.125 2024-08-12 11:28:17,398 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 11:28:26,583 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.546e+01 2.864e+01 3.475e+01 5.350e+01, threshold=5.728e+01, percent-clipped=0.0 2024-08-12 11:28:47,587 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2024-08-12 11:29:21,218 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 11:29:21,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1613410.0, ans=0.125 2024-08-12 11:29:22,734 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 11:29:24,698 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 11:29:44,746 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 1950, loss[loss=0.09254, beats_loss=0.01003, ecapa_loss=0.0002522, whisper_loss=0.07999, over 14154.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.0001708, whisper_loss=0.09117, over 3720192.31 frames. ], batch size: 61, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:29:53,403 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-12 11:29:56,271 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 11:30:00,302 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-12 11:30:02,802 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 17 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 11:30:28,551 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-12 11:30:30,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1613810.0, ans=0.125 2024-08-12 11:30:33,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1613910.0, ans=0.125 2024-08-12 11:30:36,816 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 11:30:43,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1613910.0, ans=0.0 2024-08-12 11:31:05,518 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2000, loss[loss=0.1087, beats_loss=0.0101, ecapa_loss=0.0002068, whisper_loss=0.0965, over 17070.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01078, ecapa_loss=0.0001721, whisper_loss=0.09158, over 3709982.09 frames. ], batch size: 72, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:31:09,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1614110.0, ans=0.125 2024-08-12 11:31:10,317 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 13 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 11:31:20,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.474e+01 2.700e+01 3.035e+01 6.607e+01, threshold=5.401e+01, percent-clipped=2.0 2024-08-12 11:31:34,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1614210.0, ans=0.125 2024-08-12 11:31:39,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1614310.0, ans=0.1 2024-08-12 11:31:40,928 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 11:32:04,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1614410.0, ans=0.125 2024-08-12 11:32:15,731 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 11:32:24,766 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2050, loss[loss=0.1126, beats_loss=0.01226, ecapa_loss=0.0001329, whisper_loss=0.09899, over 23654.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01085, ecapa_loss=0.0001709, whisper_loss=0.09178, over 3740618.80 frames. ], batch size: 92, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:32:47,471 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-12 11:32:58,362 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 11:32:58,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1614810.0, ans=0.125 2024-08-12 11:33:00,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1614810.0, ans=0.1 2024-08-12 11:33:08,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1614810.0, ans=0.125 2024-08-12 11:33:35,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-12 11:33:37,731 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.38 vs. limit=22.5 2024-08-12 11:33:38,611 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 11:33:46,544 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2100, loss[loss=0.1003, beats_loss=0.007251, ecapa_loss=0.0001579, whisper_loss=0.09148, over 18807.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01082, ecapa_loss=0.0001706, whisper_loss=0.09226, over 3787927.24 frames. ], batch size: 67, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:34:02,329 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.515e+01 2.855e+01 3.226e+01 9.750e+01, threshold=5.709e+01, percent-clipped=2.0 2024-08-12 11:34:07,213 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 11:34:12,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1615210.0, ans=0.0 2024-08-12 11:34:18,486 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 11:34:20,219 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.028e-02 2024-08-12 11:34:26,114 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 11:34:31,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1615310.0, ans=0.0 2024-08-12 11:34:40,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1615410.0, ans=0.2 2024-08-12 11:34:52,787 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 11:34:53,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1615510.0, ans=0.125 2024-08-12 11:34:57,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1615510.0, ans=0.05 2024-08-12 11:34:57,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1615510.0, ans=0.0 2024-08-12 11:35:04,443 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2150, loss[loss=0.1152, beats_loss=0.0116, ecapa_loss=0.0001646, whisper_loss=0.1019, over 23197.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0109, ecapa_loss=0.0001718, whisper_loss=0.09222, over 3818653.17 frames. ], batch size: 92, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:35:37,564 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-12 11:36:23,476 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2200, loss[loss=0.1208, beats_loss=0.009073, ecapa_loss=0.0001753, whisper_loss=0.11, over 18993.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01089, ecapa_loss=0.0001726, whisper_loss=0.09254, over 3823138.04 frames. ], batch size: 74, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:36:25,390 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 11:36:37,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1616110.0, ans=0.0 2024-08-12 11:36:40,644 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.491e+01 2.779e+01 3.104e+01 1.679e+02, threshold=5.558e+01, percent-clipped=1.0 2024-08-12 11:36:54,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1616210.0, ans=0.125 2024-08-12 11:36:58,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1616310.0, ans=0.1 2024-08-12 11:36:59,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1616310.0, ans=0.125 2024-08-12 11:37:10,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1616310.0, ans=0.07 2024-08-12 11:37:14,824 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 11:37:19,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1616410.0, ans=0.125 2024-08-12 11:37:33,876 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 17 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-12 11:37:41,469 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.85 vs. limit=22.5 2024-08-12 11:37:43,827 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 20 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 11:37:44,835 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2250, loss[loss=0.08939, beats_loss=0.01308, ecapa_loss=0.0001486, whisper_loss=0.07483, over 20908.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01104, ecapa_loss=0.0001726, whisper_loss=0.09203, over 3824845.21 frames. ], batch size: 84, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:38:04,348 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 11:38:07,328 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 11:38:25,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1616810.0, ans=0.125 2024-08-12 11:38:47,950 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 11:38:51,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1617010.0, ans=0.125 2024-08-12 11:38:55,853 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-12 11:38:59,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1617010.0, ans=0.1 2024-08-12 11:39:04,521 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 20 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-12 11:39:05,905 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2300, loss[loss=0.07936, beats_loss=0.01407, ecapa_loss=0.0001632, whisper_loss=0.06366, over 22257.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01105, ecapa_loss=0.000173, whisper_loss=0.09229, over 3852123.14 frames. ], batch size: 92, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:39:10,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1617110.0, ans=0.0 2024-08-12 11:39:13,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1617110.0, ans=0.2 2024-08-12 11:39:16,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1617110.0, ans=0.0 2024-08-12 11:39:22,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.578e+01 2.776e+01 3.127e+01 7.036e+01, threshold=5.552e+01, percent-clipped=1.0 2024-08-12 11:39:37,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1617310.0, ans=0.1 2024-08-12 11:40:10,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1617510.0, ans=0.0 2024-08-12 11:40:14,042 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.47 vs. limit=10.0 2024-08-12 11:40:16,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1617510.0, ans=0.0 2024-08-12 11:40:19,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1617510.0, ans=0.2 2024-08-12 11:40:25,807 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2350, loss[loss=0.09666, beats_loss=0.01238, ecapa_loss=0.0001599, whisper_loss=0.08268, over 22559.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01109, ecapa_loss=0.0001734, whisper_loss=0.0916, over 3844666.10 frames. ], batch size: 91, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:40:29,747 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.05 vs. limit=22.5 2024-08-12 11:40:29,803 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-08-12 11:40:29,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2024-08-12 11:40:37,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1617610.0, ans=0.2 2024-08-12 11:40:40,135 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-12 11:41:17,065 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 11:41:28,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1617910.0, ans=0.125 2024-08-12 11:41:32,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1618010.0, ans=0.125 2024-08-12 11:41:43,239 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 11:41:47,762 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2400, loss[loss=0.09001, beats_loss=0.01101, ecapa_loss=0.0002156, whisper_loss=0.07685, over 21476.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01102, ecapa_loss=0.0001741, whisper_loss=0.09162, over 3828593.65 frames. ], batch size: 90, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:41:58,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1618110.0, ans=0.125 2024-08-12 11:42:00,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1618110.0, ans=0.2 2024-08-12 11:42:03,081 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.471e+01 2.708e+01 3.082e+01 4.957e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-12 11:42:14,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1618210.0, ans=0.2 2024-08-12 11:42:28,910 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 11:43:06,429 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2450, loss[loss=0.09955, beats_loss=0.01175, ecapa_loss=0.0001696, whisper_loss=0.0861, over 20375.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01105, ecapa_loss=0.0001743, whisper_loss=0.09154, over 3868538.42 frames. ], batch size: 84, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:43:07,131 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-08-12 11:43:22,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1618710.0, ans=0.125 2024-08-12 11:43:30,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1618710.0, ans=0.2 2024-08-12 11:43:56,000 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.45 vs. limit=15.0 2024-08-12 11:44:02,655 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2024-08-12 11:44:49,162 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2500, loss[loss=0.08598, beats_loss=0.01379, ecapa_loss=0.0001806, whisper_loss=0.07039, over 22324.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01103, ecapa_loss=0.0001724, whisper_loss=0.09124, over 3854534.90 frames. ], batch size: 94, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:45:07,525 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.757e-02 2024-08-12 11:45:10,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.514e+01 2.796e+01 3.106e+01 8.282e+01, threshold=5.592e+01, percent-clipped=2.0 2024-08-12 11:45:17,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1619210.0, ans=0.5 2024-08-12 11:45:25,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1619210.0, ans=0.125 2024-08-12 11:45:39,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1619310.0, ans=0.1 2024-08-12 11:46:00,517 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 40 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-12 11:46:01,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1619410.0, ans=0.0 2024-08-12 11:46:36,776 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2024-08-12 11:46:39,339 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2550, loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0002453, whisper_loss=0.08932, over 21084.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01096, ecapa_loss=0.0001735, whisper_loss=0.09192, over 3876469.78 frames. ], batch size: 92, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:46:42,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1619610.0, ans=0.0 2024-08-12 11:47:06,286 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 11:47:10,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2024-08-12 11:47:36,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1619910.0, ans=0.1 2024-08-12 11:47:48,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1620010.0, ans=0.1 2024-08-12 11:47:53,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1620010.0, ans=0.125 2024-08-12 11:48:05,538 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2600, loss[loss=0.07238, beats_loss=0.01437, ecapa_loss=0.0001188, whisper_loss=0.05682, over 21487.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01096, ecapa_loss=0.0001734, whisper_loss=0.09134, over 3859549.91 frames. ], batch size: 86, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:48:07,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1620110.0, ans=0.0 2024-08-12 11:48:09,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1620110.0, ans=0.1 2024-08-12 11:48:21,144 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.606e+01 2.871e+01 3.471e+01 6.871e+01, threshold=5.743e+01, percent-clipped=3.0 2024-08-12 11:48:24,058 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-08-12 11:48:38,973 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 10 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 11:48:42,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1620310.0, ans=0.1 2024-08-12 11:48:49,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1620310.0, ans=0.1 2024-08-12 11:49:24,661 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2650, loss[loss=0.1013, beats_loss=0.0127, ecapa_loss=0.0001383, whisper_loss=0.08724, over 23613.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01091, ecapa_loss=0.0001735, whisper_loss=0.09235, over 3866963.08 frames. ], batch size: 93, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:49:24,988 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 14 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 11:49:37,790 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 11:49:50,113 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 11:49:59,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1620810.0, ans=0.125 2024-08-12 11:50:27,498 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.80 vs. limit=10.0 2024-08-12 11:50:30,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1621010.0, ans=0.2 2024-08-12 11:50:42,297 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2700, loss[loss=0.1076, beats_loss=0.01081, ecapa_loss=0.0001585, whisper_loss=0.09523, over 15065.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01103, ecapa_loss=0.0001728, whisper_loss=0.09102, over 3881745.66 frames. ], batch size: 59, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:50:44,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1621110.0, ans=0.125 2024-08-12 11:50:45,739 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.191e-01 2024-08-12 11:50:50,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1621110.0, ans=0.1 2024-08-12 11:50:58,968 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.509e+01 2.801e+01 3.158e+01 4.809e+01, threshold=5.602e+01, percent-clipped=0.0 2024-08-12 11:51:16,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1621310.0, ans=0.0 2024-08-12 11:51:16,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1621310.0, ans=0.125 2024-08-12 11:51:23,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1621310.0, ans=0.0 2024-08-12 11:51:25,881 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 11:51:32,358 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 31 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 11:51:42,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1621410.0, ans=0.125 2024-08-12 11:51:43,940 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 14 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 11:51:59,344 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.26 vs. limit=22.5 2024-08-12 11:52:02,795 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2750, loss[loss=0.1079, beats_loss=0.01095, ecapa_loss=0.00017, whisper_loss=0.09525, over 22798.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01099, ecapa_loss=0.0001739, whisper_loss=0.09117, over 3832777.84 frames. ], batch size: 94, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:52:15,518 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 11:52:29,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1621710.0, ans=0.0 2024-08-12 11:52:48,480 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 11:52:55,035 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-08-12 11:52:57,338 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 11:53:05,612 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 11:53:22,143 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2800, loss[loss=0.1057, beats_loss=0.0112, ecapa_loss=0.000151, whisper_loss=0.09295, over 14545.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01098, ecapa_loss=0.0001743, whisper_loss=0.09152, over 3849796.09 frames. ], batch size: 55, lr: 5.33e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:53:25,582 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 11:53:37,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.464e+01 2.680e+01 3.068e+01 4.016e+01, threshold=5.359e+01, percent-clipped=0.0 2024-08-12 11:53:44,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1622210.0, ans=0.125 2024-08-12 11:53:45,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1622210.0, ans=10.0 2024-08-12 11:53:48,713 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-12 11:54:13,184 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2024-08-12 11:54:43,471 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2850, loss[loss=0.1113, beats_loss=0.01208, ecapa_loss=0.0001576, whisper_loss=0.09767, over 23106.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01103, ecapa_loss=0.0001734, whisper_loss=0.09122, over 3835623.32 frames. ], batch size: 89, lr: 5.32e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:54:50,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1622610.0, ans=0.1 2024-08-12 11:55:00,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1622710.0, ans=0.1 2024-08-12 11:55:15,982 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 11:55:22,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1622810.0, ans=0.2 2024-08-12 11:55:30,759 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 11:55:34,078 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 11:55:50,026 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 11:55:50,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1623010.0, ans=0.125 2024-08-12 11:55:50,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1623010.0, ans=0.125 2024-08-12 11:55:51,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1623010.0, ans=0.125 2024-08-12 11:55:53,899 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-12 11:56:04,999 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2900, loss[loss=0.1208, beats_loss=0.008985, ecapa_loss=0.0001828, whisper_loss=0.11, over 22470.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01104, ecapa_loss=0.0001736, whisper_loss=0.09168, over 3880340.61 frames. ], batch size: 90, lr: 5.32e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:56:20,724 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.518e+01 2.817e+01 3.035e+01 4.423e+01, threshold=5.633e+01, percent-clipped=0.0 2024-08-12 11:56:48,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1623310.0, ans=0.0 2024-08-12 11:56:51,131 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 12 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 11:57:05,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1623410.0, ans=0.0 2024-08-12 11:57:25,071 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 2950, loss[loss=0.09274, beats_loss=0.01174, ecapa_loss=0.0001623, whisper_loss=0.07938, over 20300.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01106, ecapa_loss=0.0001751, whisper_loss=0.09161, over 3924046.01 frames. ], batch size: 82, lr: 5.32e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:57:25,659 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 11:57:37,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1623610.0, ans=0.125 2024-08-12 11:57:54,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1623710.0, ans=0.0 2024-08-12 11:57:58,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1623810.0, ans=0.2 2024-08-12 11:58:02,923 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-12 11:58:33,684 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-12 11:58:34,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1624010.0, ans=0.2 2024-08-12 11:58:38,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1624010.0, ans=0.125 2024-08-12 11:58:44,656 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3000, loss[loss=0.0921, beats_loss=0.01238, ecapa_loss=0.0001249, whisper_loss=0.07847, over 17172.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01103, ecapa_loss=0.0001755, whisper_loss=0.09179, over 3935934.36 frames. ], batch size: 65, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:58:44,657 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 11:59:25,786 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on ASR_libri: loss=0.256, beats_loss=0, ecapa_loss=0.0005941, whisper_loss=0.2501, over 922467.00 frames. 2024-08-12 11:59:44,928 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on SV_voxceleb1: loss=0.00471, beats_loss=0, ecapa_loss=0.000471, whisper_loss=0, over 939242.00 frames. 2024-08-12 12:00:15,147 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8240, 1.6293, 1.9387, 2.1150], device='cuda:1') 2024-08-12 12:00:21,078 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3396, 1.9872, 1.9809, 1.4449], device='cuda:1') 2024-08-12 12:01:46,920 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on AT_audioset: loss=0.02429, beats_loss=0.02429, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 12:01:46,924 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 12:02:03,800 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.553e+01 2.970e+01 3.483e+01 4.771e+01, threshold=5.939e+01, percent-clipped=0.0 2024-08-12 12:02:04,069 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 12:02:35,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1624410.0, ans=0.025 2024-08-12 12:02:41,194 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 12:02:43,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1624410.0, ans=0.0 2024-08-12 12:02:57,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1624510.0, ans=0.07 2024-08-12 12:02:58,565 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-12 12:03:02,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1624510.0, ans=0.1 2024-08-12 12:03:05,085 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3050, loss[loss=0.09483, beats_loss=0.01226, ecapa_loss=0.0001508, whisper_loss=0.08106, over 17570.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01108, ecapa_loss=0.0001752, whisper_loss=0.09122, over 3912064.08 frames. ], batch size: 70, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:03:18,709 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-12 12:03:53,126 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-12 12:03:56,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1624910.0, ans=0.1 2024-08-12 12:03:59,456 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2024-08-12 12:04:11,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1625010.0, ans=0.125 2024-08-12 12:04:14,701 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-12 12:04:25,286 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3100, loss[loss=0.1286, beats_loss=0.01013, ecapa_loss=0.0001596, whisper_loss=0.1169, over 17381.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01108, ecapa_loss=0.0001765, whisper_loss=0.09185, over 3910933.03 frames. ], batch size: 66, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:04:30,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1625110.0, ans=0.1 2024-08-12 12:04:36,524 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 12:04:42,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.511e+01 2.833e+01 3.211e+01 6.314e+01, threshold=5.667e+01, percent-clipped=1.0 2024-08-12 12:04:53,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1625210.0, ans=0.125 2024-08-12 12:05:04,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1625310.0, ans=0.125 2024-08-12 12:05:28,488 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 12:05:30,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1625510.0, ans=0.07 2024-08-12 12:05:30,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1625510.0, ans=0.1 2024-08-12 12:05:35,228 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 24 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 12:05:41,294 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 12:05:44,005 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3150, loss[loss=0.1062, beats_loss=0.009321, ecapa_loss=0.0002074, whisper_loss=0.09485, over 17692.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01105, ecapa_loss=0.0001758, whisper_loss=0.0926, over 3911155.70 frames. ], batch size: 69, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:06:00,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1625710.0, ans=0.125 2024-08-12 12:06:01,059 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2024-08-12 12:06:05,954 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-12 12:06:06,728 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-12 12:06:47,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1626010.0, ans=0.125 2024-08-12 12:06:50,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1626010.0, ans=0.0 2024-08-12 12:06:55,762 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 12:07:03,476 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3200, loss[loss=0.09666, beats_loss=0.01307, ecapa_loss=0.0002313, whisper_loss=0.08127, over 21682.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01101, ecapa_loss=0.0001775, whisper_loss=0.09223, over 3884064.92 frames. ], batch size: 94, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:07:10,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.84 vs. limit=10.0 2024-08-12 12:07:11,950 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 12:07:21,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.397e+01 2.776e+01 3.062e+01 4.690e+01, threshold=5.551e+01, percent-clipped=0.0 2024-08-12 12:07:52,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1626410.0, ans=0.125 2024-08-12 12:07:55,783 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-12 12:07:56,570 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 37 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 12:08:04,899 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 12:08:14,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1626510.0, ans=0.125 2024-08-12 12:08:17,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1626510.0, ans=0.0 2024-08-12 12:08:21,930 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 12:08:22,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1626610.0, ans=0.2 2024-08-12 12:08:22,881 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3250, loss[loss=0.1043, beats_loss=0.01167, ecapa_loss=0.0001609, whisper_loss=0.09101, over 21927.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01091, ecapa_loss=0.0001763, whisper_loss=0.0937, over 3919098.77 frames. ], batch size: 88, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:08:40,206 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 12:08:48,712 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.45 vs. limit=15.0 2024-08-12 12:08:50,630 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 12:08:58,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1626810.0, ans=0.125 2024-08-12 12:09:01,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1626810.0, ans=0.0 2024-08-12 12:09:08,797 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 12:09:15,747 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2024-08-12 12:09:20,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1626910.0, ans=0.125 2024-08-12 12:09:25,286 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.841e+00 2024-08-12 12:09:28,295 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 12:09:30,227 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.757e+00 2024-08-12 12:09:42,086 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3300, loss[loss=0.1034, beats_loss=0.01198, ecapa_loss=0.0001831, whisper_loss=0.08961, over 18714.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01103, ecapa_loss=0.000176, whisper_loss=0.09252, over 3919667.03 frames. ], batch size: 77, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:09:58,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.622e+01 3.065e+01 3.686e+01 1.090e+02, threshold=6.129e+01, percent-clipped=1.0 2024-08-12 12:10:23,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1627310.0, ans=0.1 2024-08-12 12:10:39,313 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-12 12:10:39,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1627410.0, ans=0.125 2024-08-12 12:10:45,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1627510.0, ans=0.1 2024-08-12 12:10:55,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1627510.0, ans=0.1 2024-08-12 12:10:59,345 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3350, loss[loss=0.1215, beats_loss=0.01141, ecapa_loss=0.0001652, whisper_loss=0.1084, over 22704.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01095, ecapa_loss=0.0001777, whisper_loss=0.09311, over 3918827.64 frames. ], batch size: 91, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:11:43,131 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.56 vs. limit=10.0 2024-08-12 12:11:46,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1627910.0, ans=0.1 2024-08-12 12:12:03,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1628010.0, ans=0.0 2024-08-12 12:12:07,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1628010.0, ans=0.125 2024-08-12 12:12:11,206 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2024-08-12 12:12:17,484 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3400, loss[loss=0.1071, beats_loss=0.01246, ecapa_loss=0.0001615, whisper_loss=0.09299, over 22482.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01099, ecapa_loss=0.000177, whisper_loss=0.09192, over 3895745.74 frames. ], batch size: 90, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:12:18,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1628110.0, ans=0.125 2024-08-12 12:12:26,700 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.22 vs. limit=10.0 2024-08-12 12:12:35,673 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.455e+01 2.782e+01 3.017e+01 1.106e+02, threshold=5.563e+01, percent-clipped=1.0 2024-08-12 12:12:50,034 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 12:13:05,296 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 12:13:07,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1628410.0, ans=0.0 2024-08-12 12:13:20,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-12 12:13:36,116 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3450, loss[loss=0.09323, beats_loss=0.01124, ecapa_loss=0.0002048, whisper_loss=0.07994, over 19179.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01102, ecapa_loss=0.0001778, whisper_loss=0.09142, over 3870979.27 frames. ], batch size: 78, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:13:38,422 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.918e+01 2024-08-12 12:13:47,341 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 27 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-12 12:13:50,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1628710.0, ans=0.2 2024-08-12 12:13:51,291 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.03 vs. limit=22.5 2024-08-12 12:14:04,146 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-12 12:14:05,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1628810.0, ans=0.125 2024-08-12 12:14:07,317 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 12:14:13,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1628810.0, ans=0.2 2024-08-12 12:14:20,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1628810.0, ans=0.0 2024-08-12 12:14:33,511 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 18 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 12:14:35,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1628910.0, ans=0.125 2024-08-12 12:14:48,880 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 12:14:53,508 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3500, loss[loss=0.09611, beats_loss=0.01226, ecapa_loss=0.000175, whisper_loss=0.0821, over 21913.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01106, ecapa_loss=0.0001778, whisper_loss=0.09142, over 3904806.35 frames. ], batch size: 91, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:15:01,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1629110.0, ans=0.0 2024-08-12 12:15:02,705 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 12:15:10,374 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.491e+01 2.788e+01 3.215e+01 5.809e+01, threshold=5.577e+01, percent-clipped=2.0 2024-08-12 12:15:24,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1629310.0, ans=0.125 2024-08-12 12:15:25,318 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-12 12:15:30,145 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 12:15:32,672 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 12:15:36,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1629310.0, ans=0.07 2024-08-12 12:16:00,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1629510.0, ans=0.125 2024-08-12 12:16:12,210 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3550, loss[loss=0.1131, beats_loss=0.009862, ecapa_loss=0.0002109, whisper_loss=0.1012, over 21282.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01105, ecapa_loss=0.0001771, whisper_loss=0.09143, over 3911256.55 frames. ], batch size: 89, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:16:24,969 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.40 vs. limit=15.0 2024-08-12 12:16:29,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1629710.0, ans=0.125 2024-08-12 12:16:40,740 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2024-08-12 12:17:17,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1630010.0, ans=0.125 2024-08-12 12:17:28,985 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3600, loss[loss=0.1085, beats_loss=0.01312, ecapa_loss=0.0001648, whisper_loss=0.09372, over 19519.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01101, ecapa_loss=0.0001771, whisper_loss=0.09168, over 3883556.58 frames. ], batch size: 78, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:17:30,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1630110.0, ans=0.125 2024-08-12 12:17:45,456 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.089e+01 2.537e+01 2.866e+01 3.271e+01 6.335e+01, threshold=5.732e+01, percent-clipped=1.0 2024-08-12 12:17:47,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1630210.0, ans=0.125 2024-08-12 12:17:54,890 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 17 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 12:18:05,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1630310.0, ans=0.125 2024-08-12 12:18:07,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.27 vs. limit=12.0 2024-08-12 12:18:32,162 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 12:18:45,089 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-12 12:18:46,531 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3650, loss[loss=0.1053, beats_loss=0.01432, ecapa_loss=0.0001351, whisper_loss=0.08968, over 17493.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01104, ecapa_loss=0.0001769, whisper_loss=0.09153, over 3876463.08 frames. ], batch size: 66, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:18:48,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1630610.0, ans=0.1 2024-08-12 12:18:59,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1630610.0, ans=0.2 2024-08-12 12:19:00,191 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.73 vs. limit=22.5 2024-08-12 12:19:01,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1630710.0, ans=0.125 2024-08-12 12:19:10,379 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 35 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 12:19:21,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1630810.0, ans=0.2 2024-08-12 12:19:35,000 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-12 12:19:35,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1630910.0, ans=0.125 2024-08-12 12:20:05,293 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3700, loss[loss=0.1018, beats_loss=0.009539, ecapa_loss=0.0001932, whisper_loss=0.09033, over 19762.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01092, ecapa_loss=0.000178, whisper_loss=0.09244, over 3867857.09 frames. ], batch size: 81, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:20:19,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.60 vs. limit=10.0 2024-08-12 12:20:23,219 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.690e+01 3.090e+01 3.461e+01 6.737e+01, threshold=6.180e+01, percent-clipped=1.0 2024-08-12 12:20:32,195 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 12:20:37,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1631310.0, ans=0.125 2024-08-12 12:20:49,533 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 25 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-12 12:21:07,901 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-12 12:21:11,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1631510.0, ans=0.1 2024-08-12 12:21:15,508 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-12 12:21:20,449 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 25 from Vox, 17 fro AS 2024-08-12 12:21:24,870 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3750, loss[loss=0.09512, beats_loss=0.01359, ecapa_loss=0.0002069, whisper_loss=0.07946, over 18519.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01101, ecapa_loss=0.000178, whisper_loss=0.09174, over 3846744.55 frames. ], batch size: 79, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:21:28,144 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 33 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-12 12:21:33,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1631610.0, ans=0.1 2024-08-12 12:21:38,109 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 28 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-12 12:21:48,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1631710.0, ans=0.0 2024-08-12 12:21:49,215 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 12:21:49,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1631710.0, ans=0.125 2024-08-12 12:21:52,413 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 12:21:57,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1631810.0, ans=0.07 2024-08-12 12:22:15,108 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 12:22:26,519 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-12 12:22:26,639 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=12.0 2024-08-12 12:22:29,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1632010.0, ans=0.125 2024-08-12 12:22:40,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1632010.0, ans=0.2 2024-08-12 12:22:44,861 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3800, loss[loss=0.1134, beats_loss=0.0123, ecapa_loss=0.0001583, whisper_loss=0.09956, over 15690.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01097, ecapa_loss=0.0001783, whisper_loss=0.09231, over 3883789.40 frames. ], batch size: 64, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:22:46,268 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 19 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-12 12:22:59,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1632210.0, ans=0.5 2024-08-12 12:23:02,426 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.541e+01 2.857e+01 3.346e+01 7.613e+01, threshold=5.713e+01, percent-clipped=1.0 2024-08-12 12:23:22,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1632310.0, ans=0.125 2024-08-12 12:23:23,691 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 12:23:31,853 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.26 vs. limit=22.5 2024-08-12 12:23:43,896 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-08-12 12:23:59,096 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 12:24:02,012 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3850, loss[loss=0.09442, beats_loss=0.01043, ecapa_loss=0.0002122, whisper_loss=0.08187, over 15966.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01106, ecapa_loss=0.000178, whisper_loss=0.09223, over 3874467.71 frames. ], batch size: 65, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:24:21,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1632710.0, ans=0.0 2024-08-12 12:24:28,680 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 8 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 12:24:56,681 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 12:25:19,309 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.055e-01 2024-08-12 12:25:19,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1633010.0, ans=0.125 2024-08-12 12:25:21,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1633110.0, ans=0.0 2024-08-12 12:25:22,041 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3900, loss[loss=0.1109, beats_loss=0.008857, ecapa_loss=0.0001923, whisper_loss=0.1001, over 17355.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01099, ecapa_loss=0.0001781, whisper_loss=0.09323, over 3889806.82 frames. ], batch size: 67, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:25:22,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1633110.0, ans=0.0 2024-08-12 12:25:22,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1633110.0, ans=0.05 2024-08-12 12:25:23,432 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 12:25:39,265 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.512e+01 2.803e+01 3.159e+01 7.102e+01, threshold=5.607e+01, percent-clipped=1.0 2024-08-12 12:25:44,782 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 12:25:49,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1633210.0, ans=0.125 2024-08-12 12:26:09,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1633410.0, ans=0.0 2024-08-12 12:26:23,005 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.83 vs. limit=12.0 2024-08-12 12:26:25,261 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 12:26:40,686 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 12:26:41,630 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 3950, loss[loss=0.0999, beats_loss=0.01252, ecapa_loss=0.0001756, whisper_loss=0.08562, over 22766.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01093, ecapa_loss=0.000177, whisper_loss=0.09378, over 3905717.72 frames. ], batch size: 90, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:26:43,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1633610.0, ans=0.1 2024-08-12 12:26:58,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1633710.0, ans=0.0 2024-08-12 12:27:20,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1633810.0, ans=0.0 2024-08-12 12:27:22,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1633810.0, ans=0.0 2024-08-12 12:27:37,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1633910.0, ans=0.0 2024-08-12 12:27:54,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1634010.0, ans=0.125 2024-08-12 12:27:59,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1634110.0, ans=0.0 2024-08-12 12:28:00,443 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4000, loss[loss=0.08571, beats_loss=0.0132, ecapa_loss=0.0001818, whisper_loss=0.07069, over 21758.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01103, ecapa_loss=0.0001764, whisper_loss=0.09287, over 3906728.69 frames. ], batch size: 95, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:28:09,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1634110.0, ans=0.125 2024-08-12 12:28:16,634 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.585e+01 2.882e+01 3.381e+01 6.617e+01, threshold=5.764e+01, percent-clipped=3.0 2024-08-12 12:28:17,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1634210.0, ans=0.1 2024-08-12 12:28:24,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-12 12:28:27,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1634210.0, ans=0.125 2024-08-12 12:28:33,874 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 12:28:49,562 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 12:28:59,806 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 12:29:15,775 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 12:29:18,872 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4050, loss[loss=0.1114, beats_loss=0.01174, ecapa_loss=0.0001493, whisper_loss=0.09817, over 22019.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01101, ecapa_loss=0.0001777, whisper_loss=0.09231, over 3876594.24 frames. ], batch size: 88, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:29:34,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1634710.0, ans=0.2 2024-08-12 12:29:35,649 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 12:29:48,878 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 33 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 12:30:06,648 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 12:30:36,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=1635010.0, ans=22.5 2024-08-12 12:30:39,528 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4100, loss[loss=0.1038, beats_loss=0.01057, ecapa_loss=0.0002024, whisper_loss=0.09116, over 15864.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01091, ecapa_loss=0.0001781, whisper_loss=0.09319, over 3891749.05 frames. ], batch size: 64, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:30:56,838 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.490e+01 2.729e+01 3.052e+01 9.662e+01, threshold=5.458e+01, percent-clipped=1.0 2024-08-12 12:31:01,927 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2024-08-12 12:31:08,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1635210.0, ans=0.0 2024-08-12 12:31:11,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1635310.0, ans=0.0 2024-08-12 12:31:11,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1635310.0, ans=0.125 2024-08-12 12:31:18,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1635310.0, ans=0.0 2024-08-12 12:31:23,364 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-12 12:31:25,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1635310.0, ans=0.125 2024-08-12 12:31:53,260 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.86 vs. limit=15.0 2024-08-12 12:32:00,365 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4150, loss[loss=0.09052, beats_loss=0.01503, ecapa_loss=0.0001323, whisper_loss=0.07417, over 22836.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01096, ecapa_loss=0.0001784, whisper_loss=0.09305, over 3890250.43 frames. ], batch size: 92, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:32:00,508 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 12:32:34,078 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.50 vs. limit=22.5 2024-08-12 12:33:08,150 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 12:33:14,732 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 12:33:20,530 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4200, loss[loss=0.1083, beats_loss=0.01213, ecapa_loss=0.0001565, whisper_loss=0.09458, over 18040.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01101, ecapa_loss=0.0001777, whisper_loss=0.09303, over 3893272.91 frames. ], batch size: 71, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:33:23,505 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 12:33:28,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1636110.0, ans=0.0 2024-08-12 12:33:31,374 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 12:33:31,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1636110.0, ans=0.125 2024-08-12 12:33:32,950 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 12:33:37,621 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.467e+01 2.734e+01 3.043e+01 4.289e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-12 12:33:45,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1636210.0, ans=0.1 2024-08-12 12:33:52,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1636310.0, ans=0.125 2024-08-12 12:33:59,442 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 12:34:04,681 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.43 vs. limit=22.5 2024-08-12 12:34:15,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1636410.0, ans=0.125 2024-08-12 12:34:22,783 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-12 12:34:27,837 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 12:34:39,210 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4250, loss[loss=0.1323, beats_loss=0.009534, ecapa_loss=0.000151, whisper_loss=0.1212, over 24243.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01099, ecapa_loss=0.0001773, whisper_loss=0.09352, over 3911902.23 frames. ], batch size: 92, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:34:42,932 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-12 12:34:49,593 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-12 12:35:04,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1636710.0, ans=10.0 2024-08-12 12:35:35,046 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 33 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-12 12:35:52,915 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 12:35:53,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1637010.0, ans=0.0 2024-08-12 12:35:58,557 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4300, loss[loss=0.083, beats_loss=0.01305, ecapa_loss=0.0001647, whisper_loss=0.0683, over 16760.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01094, ecapa_loss=0.0001763, whisper_loss=0.09317, over 3902190.92 frames. ], batch size: 69, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:36:14,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1637210.0, ans=0.0 2024-08-12 12:36:15,248 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.507e+01 2.747e+01 3.144e+01 4.891e+01, threshold=5.494e+01, percent-clipped=0.0 2024-08-12 12:36:22,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1637210.0, ans=0.125 2024-08-12 12:36:45,810 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-12 12:36:48,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1637410.0, ans=0.125 2024-08-12 12:36:52,092 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.820e+05 2024-08-12 12:37:06,720 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 12:37:08,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1637510.0, ans=0.2 2024-08-12 12:37:16,937 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4350, loss[loss=0.1005, beats_loss=0.0133, ecapa_loss=0.0001263, whisper_loss=0.08597, over 23041.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01093, ecapa_loss=0.0001778, whisper_loss=0.093, over 3891981.52 frames. ], batch size: 88, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:37:47,529 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 12:37:55,427 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 28 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 12:37:59,967 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 12:38:06,697 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 12:38:13,373 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 12:38:18,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=1637910.0, ans=0.1 2024-08-12 12:38:23,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1638010.0, ans=0.0 2024-08-12 12:38:35,500 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 12:38:36,748 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4400, loss[loss=0.1054, beats_loss=0.01127, ecapa_loss=0.0001659, whisper_loss=0.09252, over 19610.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01084, ecapa_loss=0.0001785, whisper_loss=0.09353, over 3869968.13 frames. ], batch size: 79, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:38:44,253 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 12:38:55,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.518e+01 2.794e+01 3.242e+01 9.315e+01, threshold=5.589e+01, percent-clipped=2.0 2024-08-12 12:39:26,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1638410.0, ans=0.1 2024-08-12 12:39:35,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1638410.0, ans=0.125 2024-08-12 12:39:41,940 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2024-08-12 12:39:43,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1638510.0, ans=0.95 2024-08-12 12:39:48,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1638510.0, ans=0.125 2024-08-12 12:39:59,148 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4450, loss[loss=0.07012, beats_loss=0.01155, ecapa_loss=0.0001657, whisper_loss=0.05691, over 15479.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01083, ecapa_loss=0.0001781, whisper_loss=0.09368, over 3888332.45 frames. ], batch size: 63, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:40:10,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1638610.0, ans=0.0 2024-08-12 12:40:23,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1638710.0, ans=0.125 2024-08-12 12:40:29,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1638710.0, ans=0.125 2024-08-12 12:40:33,477 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 12:40:38,353 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 12:40:43,352 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 28 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-12 12:41:05,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1639010.0, ans=0.125 2024-08-12 12:41:06,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1639010.0, ans=0.125 2024-08-12 12:41:14,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1639010.0, ans=0.125 2024-08-12 12:41:19,610 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4500, loss[loss=0.08102, beats_loss=0.01171, ecapa_loss=0.0001613, whisper_loss=0.0677, over 19513.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01083, ecapa_loss=0.0001779, whisper_loss=0.09316, over 3912808.05 frames. ], batch size: 79, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:41:37,363 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.496e+01 3.007e+01 3.529e+01 6.889e+01, threshold=6.014e+01, percent-clipped=3.0 2024-08-12 12:41:45,583 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 32 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 12:41:54,414 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 12:41:59,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1639310.0, ans=0.125 2024-08-12 12:42:17,671 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-12 12:42:36,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1639510.0, ans=0.2 2024-08-12 12:42:38,270 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4550, loss[loss=0.1093, beats_loss=0.01138, ecapa_loss=0.0001796, whisper_loss=0.09616, over 23204.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01094, ecapa_loss=0.0001786, whisper_loss=0.09241, over 3949836.83 frames. ], batch size: 95, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:42:45,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1639610.0, ans=0.0 2024-08-12 12:42:49,105 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 12:42:51,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1639610.0, ans=0.125 2024-08-12 12:43:05,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1639710.0, ans=0.125 2024-08-12 12:43:15,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1639810.0, ans=0.2 2024-08-12 12:43:19,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1639810.0, ans=0.125 2024-08-12 12:43:40,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1639910.0, ans=0.125 2024-08-12 12:43:45,641 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 12:43:57,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1640110.0, ans=0.0 2024-08-12 12:43:57,990 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4600, loss[loss=0.1105, beats_loss=0.01088, ecapa_loss=0.0001682, whisper_loss=0.0979, over 20551.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01096, ecapa_loss=0.000179, whisper_loss=0.09184, over 3926371.08 frames. ], batch size: 81, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:44:14,554 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.449e+01 2.715e+01 3.086e+01 6.580e+01, threshold=5.431e+01, percent-clipped=1.0 2024-08-12 12:44:29,568 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 12:44:55,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1640410.0, ans=0.125 2024-08-12 12:45:11,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1640510.0, ans=0.125 2024-08-12 12:45:16,641 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4650, loss[loss=0.07831, beats_loss=0.01212, ecapa_loss=0.0002023, whisper_loss=0.06416, over 14536.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.011, ecapa_loss=0.0001792, whisper_loss=0.09211, over 3931373.20 frames. ], batch size: 59, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:45:28,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1640610.0, ans=0.125 2024-08-12 12:45:28,926 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=12.0 2024-08-12 12:46:05,096 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 12:46:16,187 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 12:46:36,289 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4700, loss[loss=0.1012, beats_loss=0.01172, ecapa_loss=0.0002085, whisper_loss=0.08744, over 18553.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01111, ecapa_loss=0.0001781, whisper_loss=0.0914, over 3916135.65 frames. ], batch size: 79, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:46:46,861 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-12 12:46:53,762 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 12:46:54,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.509e+01 2.776e+01 3.112e+01 6.525e+01, threshold=5.552e+01, percent-clipped=1.0 2024-08-12 12:47:03,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1641210.0, ans=0.125 2024-08-12 12:47:08,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1641310.0, ans=0.125 2024-08-12 12:47:18,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1641310.0, ans=0.1 2024-08-12 12:47:44,519 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 12:47:55,008 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4750, loss[loss=0.1204, beats_loss=0.009086, ecapa_loss=0.000263, whisper_loss=0.1087, over 21029.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01102, ecapa_loss=0.0001789, whisper_loss=0.09182, over 3920403.89 frames. ], batch size: 89, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:48:03,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1641610.0, ans=0.125 2024-08-12 12:48:04,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1641610.0, ans=0.125 2024-08-12 12:48:16,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1641710.0, ans=0.0 2024-08-12 12:48:16,841 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-08-12 12:48:19,027 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 12:48:31,381 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 12:48:42,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1641910.0, ans=0.1 2024-08-12 12:48:49,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=23.51 vs. limit=22.5 2024-08-12 12:49:08,702 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2024-08-12 12:49:10,821 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4800, loss[loss=0.07018, beats_loss=0.01565, ecapa_loss=0.0001401, whisper_loss=0.05313, over 21723.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01111, ecapa_loss=0.00018, whisper_loss=0.09098, over 3896351.67 frames. ], batch size: 93, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:49:11,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1642110.0, ans=0.09899494936611666 2024-08-12 12:49:28,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.489e+01 2.813e+01 3.178e+01 7.863e+01, threshold=5.627e+01, percent-clipped=2.0 2024-08-12 12:49:38,871 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2024-08-12 12:49:50,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1642310.0, ans=0.2 2024-08-12 12:49:56,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1642410.0, ans=0.09899494936611666 2024-08-12 12:50:02,002 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=22.5 2024-08-12 12:50:28,306 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4850, loss[loss=0.09683, beats_loss=0.0116, ecapa_loss=0.0001771, whisper_loss=0.08347, over 13383.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01107, ecapa_loss=0.0001784, whisper_loss=0.09184, over 3900888.54 frames. ], batch size: 55, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:50:38,317 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 12:50:49,081 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2024-08-12 12:51:14,559 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 40 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-12 12:51:46,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1643110.0, ans=0.1 2024-08-12 12:51:47,255 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4900, loss[loss=0.1164, beats_loss=0.01183, ecapa_loss=0.0001401, whisper_loss=0.1032, over 15625.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01112, ecapa_loss=0.0001771, whisper_loss=0.09148, over 3868999.18 frames. ], batch size: 58, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:51:52,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1643110.0, ans=0.1 2024-08-12 12:51:58,167 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-12 12:51:58,562 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.163e+00 2024-08-12 12:52:03,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.565e+01 2.777e+01 3.230e+01 5.434e+01, threshold=5.553e+01, percent-clipped=0.0 2024-08-12 12:52:26,163 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 39 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 12:52:35,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1643410.0, ans=0.125 2024-08-12 12:52:38,938 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2024-08-12 12:52:39,704 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 33 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 12:52:54,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1643510.0, ans=15.0 2024-08-12 12:53:02,769 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 4950, loss[loss=0.1157, beats_loss=0.008504, ecapa_loss=0.0001693, whisper_loss=0.1055, over 22812.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01108, ecapa_loss=0.0001767, whisper_loss=0.09168, over 3841068.61 frames. ], batch size: 89, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:53:06,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1643610.0, ans=0.125 2024-08-12 12:53:28,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1643710.0, ans=0.025 2024-08-12 12:53:39,258 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-08-12 12:53:41,098 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 10 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 12:53:44,186 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.185e-01 2024-08-12 12:53:47,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1643810.0, ans=0.125 2024-08-12 12:53:52,688 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-08-12 12:54:20,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5000, loss[loss=0.09158, beats_loss=0.01135, ecapa_loss=0.0001931, whisper_loss=0.0783, over 13910.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01112, ecapa_loss=0.0001763, whisper_loss=0.09108, over 3830727.70 frames. ], batch size: 58, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:54:23,315 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-12 12:54:36,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.385e+01 2.734e+01 3.105e+01 6.733e+01, threshold=5.467e+01, percent-clipped=3.0 2024-08-12 12:54:59,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1644310.0, ans=0.0 2024-08-12 12:55:16,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1644410.0, ans=0.025 2024-08-12 12:55:17,960 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2024-08-12 12:55:22,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1644510.0, ans=0.125 2024-08-12 12:55:24,585 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2024-08-12 12:55:37,808 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5050, loss[loss=0.1256, beats_loss=0.01155, ecapa_loss=0.0001801, whisper_loss=0.1123, over 21538.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01115, ecapa_loss=0.0001766, whisper_loss=0.09157, over 3857533.20 frames. ], batch size: 86, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:55:41,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1644610.0, ans=0.125 2024-08-12 12:55:59,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1644710.0, ans=0.1 2024-08-12 12:56:10,013 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 12:56:10,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1644810.0, ans=0.2 2024-08-12 12:56:25,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1644910.0, ans=0.125 2024-08-12 12:56:28,847 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 12:56:30,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1644910.0, ans=0.125 2024-08-12 12:56:34,671 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 12:56:39,444 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 12:56:39,845 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.31 vs. limit=22.5 2024-08-12 12:56:45,509 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 12:56:50,623 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 12:56:56,006 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5100, loss[loss=0.1192, beats_loss=0.007189, ecapa_loss=0.000214, whisper_loss=0.1099, over 19447.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01105, ecapa_loss=0.0001768, whisper_loss=0.09237, over 3874264.72 frames. ], batch size: 74, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:57:13,069 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.597e+01 2.875e+01 3.428e+01 8.355e+01, threshold=5.751e+01, percent-clipped=1.0 2024-08-12 12:57:17,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1645210.0, ans=0.125 2024-08-12 12:57:20,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1645210.0, ans=0.125 2024-08-12 12:57:24,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1645210.0, ans=0.125 2024-08-12 12:57:36,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1645310.0, ans=0.0 2024-08-12 12:57:37,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1645310.0, ans=0.0 2024-08-12 12:57:50,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1645410.0, ans=0.0 2024-08-12 12:58:09,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1645510.0, ans=0.0 2024-08-12 12:58:12,303 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5150, loss[loss=0.09339, beats_loss=0.01185, ecapa_loss=0.0001591, whisper_loss=0.07995, over 18258.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01106, ecapa_loss=0.0001762, whisper_loss=0.09227, over 3889326.65 frames. ], batch size: 73, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:58:20,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1645610.0, ans=0.2 2024-08-12 12:58:22,490 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.46 vs. limit=15.0 2024-08-12 12:58:40,266 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 12:58:58,929 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 12:59:03,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=1645910.0, ans=0.1 2024-08-12 12:59:08,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1645910.0, ans=0.125 2024-08-12 12:59:17,654 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 12:59:24,449 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5200, loss[loss=0.1169, beats_loss=0.009914, ecapa_loss=0.0001649, whisper_loss=0.1053, over 23888.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01102, ecapa_loss=0.0001755, whisper_loss=0.09248, over 3889068.42 frames. ], batch size: 90, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:59:28,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1646110.0, ans=0.125 2024-08-12 12:59:36,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1646210.0, ans=0.125 2024-08-12 12:59:39,104 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.626e+01 2.923e+01 3.403e+01 3.236e+02, threshold=5.847e+01, percent-clipped=1.0 2024-08-12 12:59:46,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1646210.0, ans=0.125 2024-08-12 13:00:21,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.29 vs. limit=22.5 2024-08-12 13:00:32,404 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5250, loss[loss=0.1146, beats_loss=0.01077, ecapa_loss=0.0002146, whisper_loss=0.1017, over 21167.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01106, ecapa_loss=0.0001762, whisper_loss=0.09208, over 3892861.26 frames. ], batch size: 88, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:00:51,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1646710.0, ans=0.125 2024-08-12 13:00:53,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1646710.0, ans=0.125 2024-08-12 13:00:54,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1646710.0, ans=0.1 2024-08-12 13:01:38,662 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5300, loss[loss=0.08778, beats_loss=0.01195, ecapa_loss=0.0001754, whisper_loss=0.07408, over 20745.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01101, ecapa_loss=0.0001761, whisper_loss=0.09169, over 3883684.39 frames. ], batch size: 86, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:01:40,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1647110.0, ans=0.2 2024-08-12 13:01:41,619 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 13:01:44,082 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 13:01:54,010 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.492e+01 2.766e+01 3.259e+01 2.039e+02, threshold=5.533e+01, percent-clipped=1.0 2024-08-12 13:02:11,433 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 21 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-12 13:02:16,614 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 31 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-12 13:02:26,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1647410.0, ans=0.1 2024-08-12 13:02:26,457 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.48 vs. limit=10.0 2024-08-12 13:02:40,105 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 13:02:43,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1647610.0, ans=0.125 2024-08-12 13:02:43,812 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5350, loss[loss=0.1113, beats_loss=0.01194, ecapa_loss=0.0001552, whisper_loss=0.09783, over 15064.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01099, ecapa_loss=0.0001749, whisper_loss=0.0919, over 3854239.35 frames. ], batch size: 57, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:02:57,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1647710.0, ans=0.2 2024-08-12 13:02:59,985 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2024-08-12 13:03:29,254 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 13:03:39,612 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 13:03:39,950 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.060e-01 2024-08-12 13:03:44,867 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-12 13:03:47,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1648110.0, ans=0.1 2024-08-12 13:03:48,458 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5400, loss[loss=0.09619, beats_loss=0.0114, ecapa_loss=0.0001674, whisper_loss=0.08311, over 17052.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0109, ecapa_loss=0.000175, whisper_loss=0.09261, over 3854955.47 frames. ], batch size: 69, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:03:49,865 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 30 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 13:03:50,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-08-12 13:04:04,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.540e+01 2.809e+01 3.411e+01 5.713e+01, threshold=5.618e+01, percent-clipped=1.0 2024-08-12 13:04:06,459 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-08-12 13:04:13,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1648310.0, ans=0.125 2024-08-12 13:04:26,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1648310.0, ans=0.125 2024-08-12 13:04:32,612 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.771e+00 2024-08-12 13:04:35,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1648410.0, ans=10.0 2024-08-12 13:04:38,791 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 13:04:52,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1648510.0, ans=0.09899494936611666 2024-08-12 13:04:53,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1648610.0, ans=0.0 2024-08-12 13:04:54,137 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5450, loss[loss=0.1241, beats_loss=0.0104, ecapa_loss=0.0001498, whisper_loss=0.1122, over 18062.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01097, ecapa_loss=0.0001741, whisper_loss=0.09258, over 3841273.10 frames. ], batch size: 68, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:05:23,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1648810.0, ans=0.125 2024-08-12 13:05:29,025 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-08-12 13:05:36,342 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-12 13:05:37,757 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 36 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 13:05:59,809 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5500, loss[loss=0.1217, beats_loss=0.01039, ecapa_loss=0.0001787, whisper_loss=0.1096, over 19522.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01103, ecapa_loss=0.0001729, whisper_loss=0.09268, over 3887852.43 frames. ], batch size: 78, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:06:02,210 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-08-12 13:06:02,243 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.67 vs. limit=22.5 2024-08-12 13:06:03,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1649110.0, ans=0.1 2024-08-12 13:06:11,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1649210.0, ans=0.1 2024-08-12 13:06:15,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.546e+01 2.808e+01 3.382e+01 4.653e+01, threshold=5.615e+01, percent-clipped=0.0 2024-08-12 13:06:21,419 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 13:06:22,472 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 13:06:23,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.89 vs. limit=22.5 2024-08-12 13:06:24,298 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2024-08-12 13:06:37,746 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.824e-01 2024-08-12 13:07:05,585 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5550, loss[loss=0.1019, beats_loss=0.009788, ecapa_loss=0.000181, whisper_loss=0.09032, over 20775.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01103, ecapa_loss=0.0001743, whisper_loss=0.09239, over 3923197.74 frames. ], batch size: 80, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:07:10,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1649610.0, ans=0.125 2024-08-12 13:07:16,731 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 13:07:19,153 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=22.5 2024-08-12 13:07:49,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1649910.0, ans=0.1 2024-08-12 13:07:51,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1649910.0, ans=0.025 2024-08-12 13:07:55,875 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 13:08:17,239 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-08-12 13:08:17,397 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2024-08-12 13:08:25,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1650110.0, ans=0.2 2024-08-12 13:08:26,062 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5600, loss[loss=0.1247, beats_loss=0.008853, ecapa_loss=0.0002407, whisper_loss=0.1134, over 22115.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01102, ecapa_loss=0.0001762, whisper_loss=0.09245, over 3892212.31 frames. ], batch size: 89, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:08:30,638 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 13:08:51,294 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.532e+01 2.834e+01 3.138e+01 6.030e+01, threshold=5.668e+01, percent-clipped=1.0 2024-08-12 13:09:00,301 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-12 13:09:39,328 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-12 13:09:55,768 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5650, loss[loss=0.08525, beats_loss=0.01232, ecapa_loss=0.000187, whisper_loss=0.07106, over 22154.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01111, ecapa_loss=0.0001754, whisper_loss=0.09154, over 3903354.09 frames. ], batch size: 93, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:10:15,909 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 13:10:41,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1650910.0, ans=0.1 2024-08-12 13:10:44,879 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 13:10:59,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1651010.0, ans=0.125 2024-08-12 13:11:01,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1651010.0, ans=0.05 2024-08-12 13:11:01,405 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-08-12 13:11:10,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1651010.0, ans=0.0 2024-08-12 13:11:13,193 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5700, loss[loss=0.116, beats_loss=0.008047, ecapa_loss=0.0002293, whisper_loss=0.1056, over 21857.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01108, ecapa_loss=0.0001757, whisper_loss=0.09195, over 3931615.22 frames. ], batch size: 89, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:11:25,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1651110.0, ans=0.2 2024-08-12 13:11:26,127 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.27 vs. limit=5.0 2024-08-12 13:11:31,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.530e+01 2.812e+01 3.253e+01 9.696e+01, threshold=5.623e+01, percent-clipped=1.0 2024-08-12 13:11:40,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1651210.0, ans=0.0 2024-08-12 13:11:51,093 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.37 vs. limit=10.0 2024-08-12 13:12:01,477 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 13:12:30,594 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5750, loss[loss=0.09421, beats_loss=0.01113, ecapa_loss=0.0001453, whisper_loss=0.08163, over 18059.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01105, ecapa_loss=0.0001763, whisper_loss=0.09239, over 3915939.69 frames. ], batch size: 70, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:12:44,033 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-08-12 13:12:45,298 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 13:12:53,352 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 13:13:12,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1651810.0, ans=0.1 2024-08-12 13:13:16,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1651910.0, ans=0.125 2024-08-12 13:13:28,087 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 12 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 13:13:35,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1652010.0, ans=0.0 2024-08-12 13:13:43,054 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 13:13:45,837 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5800, loss[loss=0.1003, beats_loss=0.009501, ecapa_loss=0.0001999, whisper_loss=0.08883, over 22899.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01097, ecapa_loss=0.0001771, whisper_loss=0.0926, over 3890701.16 frames. ], batch size: 94, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:13:59,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1652110.0, ans=0.125 2024-08-12 13:14:02,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1652210.0, ans=0.2 2024-08-12 13:14:04,578 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.480e+01 2.682e+01 3.175e+01 5.563e+01, threshold=5.365e+01, percent-clipped=0.0 2024-08-12 13:14:27,213 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 13:14:43,026 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2024-08-12 13:14:47,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1652510.0, ans=0.2 2024-08-12 13:14:55,348 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 13:14:57,119 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 13:15:05,757 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5850, loss[loss=0.1117, beats_loss=0.01034, ecapa_loss=0.0001815, whisper_loss=0.09953, over 18410.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01093, ecapa_loss=0.0001771, whisper_loss=0.09252, over 3907732.58 frames. ], batch size: 74, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:15:09,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1652610.0, ans=0.0 2024-08-12 13:15:38,604 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2024-08-12 13:15:46,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1652810.0, ans=0.0 2024-08-12 13:15:55,659 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 13:16:08,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1653010.0, ans=0.125 2024-08-12 13:16:25,206 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5900, loss[loss=0.07163, beats_loss=0.01365, ecapa_loss=0.0001858, whisper_loss=0.05613, over 16236.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01099, ecapa_loss=0.0001762, whisper_loss=0.0914, over 3874222.46 frames. ], batch size: 69, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:16:25,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1653110.0, ans=0.0 2024-08-12 13:16:38,246 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=15.0 2024-08-12 13:16:43,937 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.413e+01 2.728e+01 2.999e+01 4.140e+01, threshold=5.456e+01, percent-clipped=0.0 2024-08-12 13:16:53,979 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=15.0 2024-08-12 13:16:56,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1653310.0, ans=0.125 2024-08-12 13:17:01,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1653310.0, ans=0.2 2024-08-12 13:17:04,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1653310.0, ans=0.125 2024-08-12 13:17:07,403 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 13:17:20,162 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 13:17:26,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1653510.0, ans=0.125 2024-08-12 13:17:26,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1653510.0, ans=0.5 2024-08-12 13:17:27,603 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 13:17:42,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1653610.0, ans=0.125 2024-08-12 13:17:43,013 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 5950, loss[loss=0.08755, beats_loss=0.01462, ecapa_loss=0.000119, whisper_loss=0.07174, over 14324.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01113, ecapa_loss=0.0001754, whisper_loss=0.09122, over 3880749.31 frames. ], batch size: 54, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:17:46,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1653610.0, ans=0.2 2024-08-12 13:17:57,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2024-08-12 13:18:00,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1653710.0, ans=0.125 2024-08-12 13:18:09,486 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-12 13:18:09,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1653710.0, ans=0.125 2024-08-12 13:18:14,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1653810.0, ans=0.0 2024-08-12 13:18:14,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1653810.0, ans=0.125 2024-08-12 13:18:14,751 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-12 13:18:15,456 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 29 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 13:18:19,508 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-08-12 13:18:35,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1653910.0, ans=0.125 2024-08-12 13:18:54,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1654010.0, ans=0.125 2024-08-12 13:19:03,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6000, loss[loss=0.08283, beats_loss=0.01025, ecapa_loss=0.0001824, whisper_loss=0.07075, over 20562.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01113, ecapa_loss=0.0001761, whisper_loss=0.09169, over 3888462.14 frames. ], batch size: 80, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:19:03,584 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 13:19:40,054 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on ASR_libri: loss=0.2551, beats_loss=0, ecapa_loss=0.0005888, whisper_loss=0.2492, over 922467.00 frames. 2024-08-12 13:19:58,274 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on SV_voxceleb1: loss=0.004729, beats_loss=0, ecapa_loss=0.0004729, whisper_loss=0, over 939242.00 frames. 2024-08-12 13:20:08,715 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1172, 1.7530, 1.7716, 1.7915], device='cuda:1') 2024-08-12 13:21:43,829 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on AT_audioset: loss=0.02432, beats_loss=0.02432, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 13:21:43,833 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 13:22:03,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.605e+01 2.854e+01 3.270e+01 6.510e+01, threshold=5.707e+01, percent-clipped=1.0 2024-08-12 13:22:05,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1654210.0, ans=0.0 2024-08-12 13:22:13,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1654210.0, ans=0.125 2024-08-12 13:22:14,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1654310.0, ans=0.125 2024-08-12 13:22:17,752 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2024-08-12 13:22:20,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1654310.0, ans=0.0 2024-08-12 13:22:29,631 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 13:22:32,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1654410.0, ans=0.0 2024-08-12 13:22:32,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=1654410.0, ans=0.2 2024-08-12 13:22:33,241 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-12 13:22:37,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-12 13:22:51,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1654510.0, ans=0.0 2024-08-12 13:22:51,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1654510.0, ans=0.2 2024-08-12 13:23:02,029 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 20 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 13:23:02,942 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6050, loss[loss=0.08724, beats_loss=0.01345, ecapa_loss=0.0001189, whisper_loss=0.0726, over 21152.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0112, ecapa_loss=0.0001759, whisper_loss=0.09143, over 3887838.25 frames. ], batch size: 82, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:23:18,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1654710.0, ans=0.125 2024-08-12 13:23:20,356 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-08-12 13:23:31,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1654710.0, ans=0.125 2024-08-12 13:23:33,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1654710.0, ans=0.125 2024-08-12 13:23:37,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1654810.0, ans=0.2 2024-08-12 13:23:42,475 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 13:23:52,973 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 13:23:56,172 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 30 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 13:23:59,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1654910.0, ans=0.125 2024-08-12 13:24:03,465 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 13 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 13:24:09,911 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 13:24:11,659 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 13:24:22,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1655110.0, ans=0.035 2024-08-12 13:24:23,332 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6100, loss[loss=0.09519, beats_loss=0.01151, ecapa_loss=0.0002023, whisper_loss=0.08166, over 18805.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01111, ecapa_loss=0.0001765, whisper_loss=0.09163, over 3863694.27 frames. ], batch size: 78, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:24:32,794 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-12 13:24:34,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1655110.0, ans=0.125 2024-08-12 13:24:38,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1655210.0, ans=0.1 2024-08-12 13:24:40,157 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2024-08-12 13:24:42,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.433e+01 2.687e+01 2.996e+01 4.596e+01, threshold=5.373e+01, percent-clipped=0.0 2024-08-12 13:24:42,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1655210.0, ans=0.0 2024-08-12 13:25:25,335 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-12 13:25:33,873 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=12.0 2024-08-12 13:25:42,841 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6150, loss[loss=0.11, beats_loss=0.01102, ecapa_loss=0.0001752, whisper_loss=0.09723, over 18226.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01112, ecapa_loss=0.0001767, whisper_loss=0.09143, over 3884375.37 frames. ], batch size: 71, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:26:18,227 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 13:26:23,892 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 13:26:54,262 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2024-08-12 13:27:01,868 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6200, loss[loss=0.09426, beats_loss=0.01293, ecapa_loss=0.0001648, whisper_loss=0.07968, over 16028.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01114, ecapa_loss=0.0001754, whisper_loss=0.09152, over 3880935.43 frames. ], batch size: 62, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:27:21,056 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.566e+01 2.980e+01 3.459e+01 1.302e+02, threshold=5.960e+01, percent-clipped=3.0 2024-08-12 13:27:30,753 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-12 13:27:35,766 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 13:27:49,752 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2024-08-12 13:27:57,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1656410.0, ans=0.1 2024-08-12 13:28:04,266 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.61 vs. limit=10.0 2024-08-12 13:28:20,358 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6250, loss[loss=0.09481, beats_loss=0.01299, ecapa_loss=0.0001413, whisper_loss=0.08041, over 17468.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01113, ecapa_loss=0.0001751, whisper_loss=0.09093, over 3888314.15 frames. ], batch size: 70, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:28:20,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1656610.0, ans=0.1 2024-08-12 13:28:35,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1656710.0, ans=0.125 2024-08-12 13:28:55,123 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 13:28:55,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1656810.0, ans=0.125 2024-08-12 13:29:00,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1656810.0, ans=0.125 2024-08-12 13:29:06,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1656910.0, ans=0.0 2024-08-12 13:29:15,757 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 30 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 13:29:17,558 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 13:29:23,075 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 30 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 13:29:28,838 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-08-12 13:29:29,501 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 13:29:34,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1657010.0, ans=0.125 2024-08-12 13:29:38,529 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6300, loss[loss=0.0804, beats_loss=0.013, ecapa_loss=0.000189, whisper_loss=0.06551, over 19140.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01112, ecapa_loss=0.0001746, whisper_loss=0.0911, over 3852036.04 frames. ], batch size: 81, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:29:43,913 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 13:29:47,192 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=12.0 2024-08-12 13:29:49,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1657110.0, ans=0.0 2024-08-12 13:29:56,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.533e+01 2.764e+01 3.173e+01 6.844e+01, threshold=5.528e+01, percent-clipped=1.0 2024-08-12 13:30:19,598 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-12 13:30:19,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1657310.0, ans=0.0 2024-08-12 13:30:33,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1657410.0, ans=0.0 2024-08-12 13:30:36,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1657410.0, ans=0.0 2024-08-12 13:30:54,843 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6350, loss[loss=0.09097, beats_loss=0.01381, ecapa_loss=0.0001616, whisper_loss=0.07554, over 16591.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01111, ecapa_loss=0.0001756, whisper_loss=0.09078, over 3857723.40 frames. ], batch size: 66, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:30:56,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1657610.0, ans=0.125 2024-08-12 13:31:00,559 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 13:31:05,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1657610.0, ans=0.1 2024-08-12 13:31:08,701 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.48 vs. limit=22.5 2024-08-12 13:31:11,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1657710.0, ans=0.0 2024-08-12 13:31:14,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1657710.0, ans=0.5 2024-08-12 13:31:16,240 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 13:31:23,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1657810.0, ans=0.125 2024-08-12 13:31:34,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1657810.0, ans=0.125 2024-08-12 13:31:41,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1657910.0, ans=0.2 2024-08-12 13:31:49,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1657910.0, ans=0.125 2024-08-12 13:31:50,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1657910.0, ans=0.125 2024-08-12 13:32:02,470 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 13:32:11,061 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6400, loss[loss=0.104, beats_loss=0.01175, ecapa_loss=0.0001736, whisper_loss=0.09055, over 21392.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01111, ecapa_loss=0.0001767, whisper_loss=0.09135, over 3896248.58 frames. ], batch size: 88, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:32:11,184 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 38 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-12 13:32:29,485 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.462e+01 2.715e+01 3.060e+01 4.478e+01, threshold=5.430e+01, percent-clipped=0.0 2024-08-12 13:32:40,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1658310.0, ans=0.0 2024-08-12 13:32:41,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-12 13:32:43,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1658310.0, ans=0.125 2024-08-12 13:32:46,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1658310.0, ans=0.1 2024-08-12 13:33:26,033 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6450, loss[loss=0.09161, beats_loss=0.01101, ecapa_loss=0.000168, whisper_loss=0.07893, over 19400.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01109, ecapa_loss=0.0001765, whisper_loss=0.09159, over 3909181.08 frames. ], batch size: 78, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:33:54,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1658710.0, ans=0.0 2024-08-12 13:34:06,747 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 26 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-12 13:34:17,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1658910.0, ans=0.125 2024-08-12 13:34:20,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1658910.0, ans=0.1 2024-08-12 13:34:41,085 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6500, loss[loss=0.1122, beats_loss=0.009367, ecapa_loss=0.000198, whisper_loss=0.1009, over 20800.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01105, ecapa_loss=0.0001772, whisper_loss=0.09236, over 3926853.53 frames. ], batch size: 84, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:34:46,179 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-12 13:34:58,948 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 2.642e+01 2.943e+01 3.228e+01 1.281e+02, threshold=5.885e+01, percent-clipped=1.0 2024-08-12 13:35:10,942 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 13:35:16,408 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-12 13:35:18,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1659310.0, ans=0.125 2024-08-12 13:35:22,289 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-12 13:35:25,600 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-12 13:35:33,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1659410.0, ans=0.125 2024-08-12 13:35:44,887 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 17 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 13:35:55,376 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6550, loss[loss=0.1136, beats_loss=0.01125, ecapa_loss=0.0001261, whisper_loss=0.1011, over 15279.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01099, ecapa_loss=0.0001764, whisper_loss=0.09267, over 3898051.43 frames. ], batch size: 57, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:36:00,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1659610.0, ans=0.1 2024-08-12 13:36:02,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1659610.0, ans=0.125 2024-08-12 13:36:17,388 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 13:36:37,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1659810.0, ans=0.1 2024-08-12 13:36:43,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-08-12 13:36:55,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1660010.0, ans=0.0 2024-08-12 13:37:01,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1660010.0, ans=0.0 2024-08-12 13:37:05,923 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-12 13:37:10,595 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6600, loss[loss=0.1026, beats_loss=0.01219, ecapa_loss=0.0001673, whisper_loss=0.08877, over 22562.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01094, ecapa_loss=0.0001766, whisper_loss=0.09332, over 3935116.79 frames. ], batch size: 92, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:37:23,111 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.69 vs. limit=10.0 2024-08-12 13:37:28,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.664e+01 3.043e+01 3.449e+01 7.276e+01, threshold=6.087e+01, percent-clipped=1.0 2024-08-12 13:37:46,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1660310.0, ans=0.125 2024-08-12 13:38:05,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1660410.0, ans=0.0 2024-08-12 13:38:22,841 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 13:38:23,775 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6650, loss[loss=0.123, beats_loss=0.009346, ecapa_loss=0.0001757, whisper_loss=0.1119, over 22865.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01092, ecapa_loss=0.0001773, whisper_loss=0.09364, over 3932656.70 frames. ], batch size: 88, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:38:29,659 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 13:38:32,687 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-12 13:38:34,074 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 13:38:35,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1660610.0, ans=0.0 2024-08-12 13:38:44,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1660710.0, ans=0.0 2024-08-12 13:38:50,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1660710.0, ans=0.2 2024-08-12 13:38:54,932 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2024-08-12 13:39:02,837 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-12 13:39:09,746 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 13:39:15,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1660910.0, ans=0.1 2024-08-12 13:39:24,243 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 15 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 13:39:30,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1661010.0, ans=0.95 2024-08-12 13:39:35,029 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6700, loss[loss=0.1217, beats_loss=0.01016, ecapa_loss=0.0001884, whisper_loss=0.1097, over 23481.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.011, ecapa_loss=0.000176, whisper_loss=0.09294, over 3918338.90 frames. ], batch size: 91, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:39:51,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1661210.0, ans=0.2 2024-08-12 13:39:52,505 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.550e+01 2.931e+01 3.277e+01 4.693e+01, threshold=5.862e+01, percent-clipped=0.0 2024-08-12 13:39:56,174 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2024-08-12 13:40:01,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1661210.0, ans=0.125 2024-08-12 13:40:18,255 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 23 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 13:40:31,023 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.12 vs. limit=22.5 2024-08-12 13:40:31,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1661510.0, ans=0.125 2024-08-12 13:40:44,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1661510.0, ans=0.125 2024-08-12 13:40:47,694 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6750, loss[loss=0.09814, beats_loss=0.01205, ecapa_loss=0.0001933, whisper_loss=0.08416, over 18058.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01098, ecapa_loss=0.0001769, whisper_loss=0.09278, over 3903183.95 frames. ], batch size: 75, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:40:53,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1661610.0, ans=0.0 2024-08-12 13:41:00,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1661710.0, ans=0.125 2024-08-12 13:41:20,517 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.71 vs. limit=5.0 2024-08-12 13:41:23,603 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 13:41:44,994 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 15 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 13:41:50,329 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 13:41:53,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1662010.0, ans=0.2 2024-08-12 13:41:58,502 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6800, loss[loss=0.1167, beats_loss=0.01161, ecapa_loss=0.0001932, whisper_loss=0.1031, over 16151.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01095, ecapa_loss=0.0001777, whisper_loss=0.09272, over 3871480.07 frames. ], batch size: 64, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:42:14,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.516e+01 2.723e+01 3.027e+01 3.885e+01, threshold=5.446e+01, percent-clipped=0.0 2024-08-12 13:42:18,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1662210.0, ans=10.0 2024-08-12 13:42:30,620 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 13:42:41,484 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-12 13:43:06,018 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 13:43:08,238 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6850, loss[loss=0.1097, beats_loss=0.01099, ecapa_loss=0.0001596, whisper_loss=0.09713, over 22626.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01094, ecapa_loss=0.0001778, whisper_loss=0.09202, over 3851651.25 frames. ], batch size: 90, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:43:12,621 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 13:43:41,055 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 13:43:46,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1662810.0, ans=0.125 2024-08-12 13:43:55,646 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 13:44:12,064 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 21 from Vox, 51 fro AS 2024-08-12 13:44:19,363 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6900, loss[loss=0.1046, beats_loss=0.008403, ecapa_loss=0.0001845, whisper_loss=0.09431, over 14690.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01101, ecapa_loss=0.0001772, whisper_loss=0.09199, over 3861761.40 frames. ], batch size: 57, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:44:24,634 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.45 vs. limit=10.0 2024-08-12 13:44:26,546 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 13:44:29,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1663110.0, ans=0.125 2024-08-12 13:44:35,794 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.437e+01 2.665e+01 2.983e+01 5.492e+01, threshold=5.330e+01, percent-clipped=1.0 2024-08-12 13:44:37,441 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-12 13:44:44,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1663210.0, ans=0.125 2024-08-12 13:44:46,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1663310.0, ans=0.125 2024-08-12 13:44:49,207 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 13:44:59,721 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 13:45:10,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1663410.0, ans=0.1 2024-08-12 13:45:24,317 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-12 13:45:29,926 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 6950, loss[loss=0.1045, beats_loss=0.009944, ecapa_loss=0.0001434, whisper_loss=0.09313, over 14487.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.011, ecapa_loss=0.0001759, whisper_loss=0.0929, over 3843365.14 frames. ], batch size: 53, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:45:40,560 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.386e-03 2024-08-12 13:45:52,151 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 13:45:58,370 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 13:46:02,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1663810.0, ans=0.125 2024-08-12 13:46:02,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1663810.0, ans=0.125 2024-08-12 13:46:06,032 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2024-08-12 13:46:09,669 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-12 13:46:12,466 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 13:46:22,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1663910.0, ans=0.0 2024-08-12 13:46:23,711 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 20 from LS+wenet, 36 from Vox, 36 fro AS 2024-08-12 13:46:28,724 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2024-08-12 13:46:33,096 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 43 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 13:46:42,869 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7000, loss[loss=0.06681, beats_loss=0.01247, ecapa_loss=0.0002079, whisper_loss=0.05227, over 14127.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01096, ecapa_loss=0.0001779, whisper_loss=0.09307, over 3847031.83 frames. ], batch size: 61, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:46:50,564 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 13:46:53,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1664110.0, ans=0.1 2024-08-12 13:46:54,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=1664110.0, ans=0.02 2024-08-12 13:46:54,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1664110.0, ans=0.125 2024-08-12 13:47:00,006 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.453e+01 2.767e+01 3.334e+01 1.862e+02, threshold=5.533e+01, percent-clipped=4.0 2024-08-12 13:47:26,422 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 13:47:31,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1664410.0, ans=0.125 2024-08-12 13:47:39,552 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 21 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-12 13:47:41,888 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.654e-01 2024-08-12 13:47:43,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1664510.0, ans=0.125 2024-08-12 13:47:46,981 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 13:47:51,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1664510.0, ans=0.2 2024-08-12 13:47:55,451 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7050, loss[loss=0.08215, beats_loss=0.01413, ecapa_loss=0.0001591, whisper_loss=0.06643, over 21492.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01107, ecapa_loss=0.0001768, whisper_loss=0.09231, over 3866548.31 frames. ], batch size: 89, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:47:55,591 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 13:48:05,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1664610.0, ans=0.0 2024-08-12 13:48:27,571 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-12 13:48:41,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1664910.0, ans=0.125 2024-08-12 13:49:08,558 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2024-08-12 13:49:09,043 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7100, loss[loss=0.1148, beats_loss=0.01164, ecapa_loss=0.0001446, whisper_loss=0.1017, over 19917.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01108, ecapa_loss=0.0001752, whisper_loss=0.09165, over 3858804.72 frames. ], batch size: 77, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:49:09,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=1665110.0, ans=0.5 2024-08-12 13:49:12,521 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-12 13:49:23,700 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 13:49:24,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1665210.0, ans=0.09899494936611666 2024-08-12 13:49:25,966 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.512e+01 2.817e+01 3.133e+01 5.318e+01, threshold=5.634e+01, percent-clipped=0.0 2024-08-12 13:49:34,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1665210.0, ans=0.04949747468305833 2024-08-12 13:49:36,685 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 13:49:56,397 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 13:50:04,974 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-12 13:50:18,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1665510.0, ans=0.125 2024-08-12 13:50:22,320 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7150, loss[loss=0.09829, beats_loss=0.01221, ecapa_loss=0.0001616, whisper_loss=0.08445, over 21639.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01102, ecapa_loss=0.0001746, whisper_loss=0.09191, over 3878903.89 frames. ], batch size: 88, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:50:33,410 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-12 13:50:42,009 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 13:51:00,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1665810.0, ans=0.1 2024-08-12 13:51:02,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1665810.0, ans=0.0 2024-08-12 13:51:13,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1665910.0, ans=0.125 2024-08-12 13:51:13,533 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2024-08-12 13:51:35,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1666110.0, ans=0.1 2024-08-12 13:51:36,171 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7200, loss[loss=0.09221, beats_loss=0.01119, ecapa_loss=0.0001626, whisper_loss=0.0794, over 13965.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01098, ecapa_loss=0.000175, whisper_loss=0.09214, over 3874323.65 frames. ], batch size: 55, lr: 5.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:51:45,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1666110.0, ans=0.0 2024-08-12 13:51:53,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.582e+01 2.995e+01 3.267e+01 4.717e+01, threshold=5.989e+01, percent-clipped=0.0 2024-08-12 13:52:02,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1666210.0, ans=0.0 2024-08-12 13:52:31,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1666410.0, ans=0.0 2024-08-12 13:52:32,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1666410.0, ans=0.1 2024-08-12 13:52:33,655 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 13:52:39,043 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 13:52:48,655 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7250, loss[loss=0.1179, beats_loss=0.01101, ecapa_loss=0.0001556, whisper_loss=0.1053, over 23440.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01092, ecapa_loss=0.0001762, whisper_loss=0.09259, over 3881847.75 frames. ], batch size: 94, lr: 5.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:53:03,349 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 22 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 13:53:05,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1666710.0, ans=0.0 2024-08-12 13:53:11,947 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 13:53:23,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1666810.0, ans=0.1 2024-08-12 13:53:35,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1666910.0, ans=0.015 2024-08-12 13:53:38,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1666910.0, ans=0.125 2024-08-12 13:53:48,244 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.26 vs. limit=15.0 2024-08-12 13:53:49,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1667010.0, ans=0.2 2024-08-12 13:54:01,104 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 13:54:05,188 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7300, loss[loss=0.1087, beats_loss=0.01022, ecapa_loss=0.000212, whisper_loss=0.09632, over 17004.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01089, ecapa_loss=0.0001754, whisper_loss=0.09291, over 3862862.32 frames. ], batch size: 68, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:54:08,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1667110.0, ans=0.0 2024-08-12 13:54:09,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1667110.0, ans=0.125 2024-08-12 13:54:11,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1667110.0, ans=0.0 2024-08-12 13:54:16,642 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.40 vs. limit=22.5 2024-08-12 13:54:24,024 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.81 vs. limit=15.0 2024-08-12 13:54:24,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.453e+01 2.736e+01 3.058e+01 4.580e+01, threshold=5.471e+01, percent-clipped=0.0 2024-08-12 13:54:32,990 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.36 vs. limit=22.5 2024-08-12 13:54:42,076 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 13:54:54,847 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=15.0 2024-08-12 13:55:09,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.99 vs. limit=15.0 2024-08-12 13:55:09,713 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 13:55:11,177 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 13 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 13:55:16,557 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2024-08-12 13:55:23,498 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7350, loss[loss=0.09352, beats_loss=0.01586, ecapa_loss=0.000123, whisper_loss=0.07643, over 14080.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01089, ecapa_loss=0.0001754, whisper_loss=0.09333, over 3881571.64 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:55:35,864 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=22.5 2024-08-12 13:55:41,387 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 13:55:56,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1667810.0, ans=0.125 2024-08-12 13:56:01,413 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-12 13:56:10,115 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 13:56:10,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1667910.0, ans=0.1 2024-08-12 13:56:13,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1667910.0, ans=0.0 2024-08-12 13:56:13,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1667910.0, ans=0.1 2024-08-12 13:56:15,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1667910.0, ans=0.125 2024-08-12 13:56:28,913 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 13:56:41,159 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7400, loss[loss=0.09143, beats_loss=0.01476, ecapa_loss=0.0001202, whisper_loss=0.07548, over 14212.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01089, ecapa_loss=0.0001749, whisper_loss=0.09322, over 3867396.19 frames. ], batch size: 55, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:56:41,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1668110.0, ans=0.0 2024-08-12 13:56:46,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1668110.0, ans=0.0 2024-08-12 13:56:58,737 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.596e+01 2.915e+01 3.233e+01 4.650e+01, threshold=5.831e+01, percent-clipped=0.0 2024-08-12 13:57:10,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1668310.0, ans=0.0 2024-08-12 13:57:24,923 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-12 13:57:55,767 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7450, loss[loss=0.114, beats_loss=0.0097, ecapa_loss=0.0001795, whisper_loss=0.1025, over 23390.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0109, ecapa_loss=0.0001764, whisper_loss=0.09326, over 3865597.10 frames. ], batch size: 92, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:58:19,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1668710.0, ans=0.125 2024-08-12 13:58:40,682 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 13:58:43,728 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 13:58:50,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1668910.0, ans=0.125 2024-08-12 13:58:51,317 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 13:58:51,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1668910.0, ans=0.0 2024-08-12 13:58:59,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1669010.0, ans=0.0 2024-08-12 13:59:10,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1669010.0, ans=0.1 2024-08-12 13:59:12,853 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7500, loss[loss=0.09529, beats_loss=0.01043, ecapa_loss=0.0001872, whisper_loss=0.08299, over 16850.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01092, ecapa_loss=0.0001776, whisper_loss=0.09341, over 3880714.52 frames. ], batch size: 69, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:59:21,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1669110.0, ans=0.95 2024-08-12 13:59:23,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1669110.0, ans=0.2 2024-08-12 13:59:27,964 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 13:59:30,395 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.570e+01 2.878e+01 3.293e+01 5.497e+01, threshold=5.755e+01, percent-clipped=0.0 2024-08-12 13:59:31,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1669210.0, ans=0.0 2024-08-12 13:59:46,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1669310.0, ans=0.2 2024-08-12 13:59:55,953 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 13:59:58,915 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 14:00:00,207 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 14:00:00,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1669410.0, ans=0.0 2024-08-12 14:00:03,044 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 14:00:05,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1669410.0, ans=0.125 2024-08-12 14:00:07,189 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 12 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 14:00:15,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1669510.0, ans=0.07 2024-08-12 14:00:16,185 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 14:00:26,370 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7550, loss[loss=0.09255, beats_loss=0.01252, ecapa_loss=0.0001522, whisper_loss=0.0785, over 17708.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01093, ecapa_loss=0.0001772, whisper_loss=0.09249, over 3823959.42 frames. ], batch size: 70, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:00:30,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1669610.0, ans=0.125 2024-08-12 14:00:34,478 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-12 14:00:37,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1669610.0, ans=0.125 2024-08-12 14:00:42,324 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-12 14:00:43,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1669710.0, ans=0.125 2024-08-12 14:00:56,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1669810.0, ans=0.1 2024-08-12 14:01:00,542 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 14:01:11,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1669910.0, ans=0.2 2024-08-12 14:01:19,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1669910.0, ans=0.125 2024-08-12 14:01:20,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1669910.0, ans=0.0 2024-08-12 14:01:31,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1670010.0, ans=0.125 2024-08-12 14:01:41,004 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7600, loss[loss=0.1023, beats_loss=0.009605, ecapa_loss=0.0001886, whisper_loss=0.09086, over 19130.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01092, ecapa_loss=0.0001763, whisper_loss=0.09266, over 3801855.78 frames. ], batch size: 75, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:01:59,028 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.500e+01 2.707e+01 3.102e+01 5.200e+01, threshold=5.414e+01, percent-clipped=0.0 2024-08-12 14:02:04,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.60 vs. limit=10.0 2024-08-12 14:02:11,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1670310.0, ans=0.125 2024-08-12 14:02:22,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1670310.0, ans=0.0 2024-08-12 14:02:30,630 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:02:42,309 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2024-08-12 14:02:52,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1670510.0, ans=0.2 2024-08-12 14:02:54,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1670510.0, ans=0.1 2024-08-12 14:02:57,286 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7650, loss[loss=0.115, beats_loss=0.009162, ecapa_loss=0.0001896, whisper_loss=0.1039, over 22002.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01088, ecapa_loss=0.0001753, whisper_loss=0.09262, over 3806841.48 frames. ], batch size: 89, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:03:08,448 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 14:03:24,993 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 14:03:44,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1670910.0, ans=0.0 2024-08-12 14:03:51,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1670910.0, ans=0.125 2024-08-12 14:03:54,469 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 14:03:54,889 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-12 14:04:04,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1671010.0, ans=0.0 2024-08-12 14:04:14,446 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7700, loss[loss=0.09545, beats_loss=0.0102, ecapa_loss=0.0001683, whisper_loss=0.08357, over 20797.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01083, ecapa_loss=0.0001765, whisper_loss=0.09235, over 3821118.59 frames. ], batch size: 83, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:04:16,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1671110.0, ans=0.125 2024-08-12 14:04:21,131 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:04:33,468 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.546e+01 2.810e+01 3.287e+01 1.654e+02, threshold=5.620e+01, percent-clipped=2.0 2024-08-12 14:04:59,905 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 14:05:11,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1671410.0, ans=0.125 2024-08-12 14:05:11,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1671410.0, ans=0.0 2024-08-12 14:05:14,919 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-12 14:05:31,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1671510.0, ans=0.125 2024-08-12 14:05:32,695 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 17 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-12 14:05:36,109 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7750, loss[loss=0.09196, beats_loss=0.01121, ecapa_loss=0.0001767, whisper_loss=0.07898, over 14783.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01085, ecapa_loss=0.0001774, whisper_loss=0.09216, over 3818679.03 frames. ], batch size: 60, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:05:42,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-08-12 14:05:47,120 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 14:05:54,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.16 vs. limit=15.0 2024-08-12 14:05:57,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1671710.0, ans=0.125 2024-08-12 14:06:15,653 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 14:06:18,343 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2024-08-12 14:06:58,686 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 14:07:02,804 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7800, loss[loss=0.1157, beats_loss=0.009742, ecapa_loss=0.0001755, whisper_loss=0.1042, over 18596.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01093, ecapa_loss=0.000176, whisper_loss=0.0916, over 3853268.19 frames. ], batch size: 72, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:07:07,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1672110.0, ans=0.0 2024-08-12 14:07:23,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.562e+01 2.777e+01 3.107e+01 5.363e+01, threshold=5.555e+01, percent-clipped=0.0 2024-08-12 14:07:35,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1672310.0, ans=0.1 2024-08-12 14:07:45,409 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.007e+00 2024-08-12 14:08:00,814 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2024-08-12 14:08:02,097 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2024-08-12 14:08:06,630 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 14:08:16,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1672510.0, ans=0.1 2024-08-12 14:08:22,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1672510.0, ans=0.1 2024-08-12 14:08:24,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1672510.0, ans=0.125 2024-08-12 14:08:28,796 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7850, loss[loss=0.1025, beats_loss=0.01023, ecapa_loss=0.0001739, whisper_loss=0.09053, over 22624.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.0001751, whisper_loss=0.09158, over 3871937.56 frames. ], batch size: 91, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:08:32,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1672610.0, ans=0.0 2024-08-12 14:09:05,987 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 18 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 14:09:12,931 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2024-08-12 14:09:13,855 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 36 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 14:09:59,047 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7900, loss[loss=0.1015, beats_loss=0.01144, ecapa_loss=0.000158, whisper_loss=0.08851, over 16666.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01092, ecapa_loss=0.000176, whisper_loss=0.09251, over 3870111.42 frames. ], batch size: 64, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:10:00,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1673110.0, ans=0.125 2024-08-12 14:10:17,553 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.710e+01 2.918e+01 3.314e+01 4.550e+01, threshold=5.837e+01, percent-clipped=0.0 2024-08-12 14:10:25,584 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-12 14:10:42,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1673310.0, ans=0.125 2024-08-12 14:11:12,839 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2024-08-12 14:11:18,596 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 7950, loss[loss=0.09307, beats_loss=0.01189, ecapa_loss=0.0001879, whisper_loss=0.0793, over 21808.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01098, ecapa_loss=0.0001756, whisper_loss=0.09218, over 3857061.17 frames. ], batch size: 90, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:11:39,084 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-12 14:11:49,689 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 14:12:38,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1674010.0, ans=0.0 2024-08-12 14:12:47,955 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8000, loss[loss=0.08843, beats_loss=0.01262, ecapa_loss=0.0001491, whisper_loss=0.07432, over 21653.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01095, ecapa_loss=0.0001751, whisper_loss=0.09251, over 3839442.76 frames. ], batch size: 87, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:12:58,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1674110.0, ans=0.2 2024-08-12 14:13:06,593 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2024-08-12 14:13:07,788 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.597e+01 2.930e+01 3.466e+01 8.592e+01, threshold=5.860e+01, percent-clipped=1.0 2024-08-12 14:13:23,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1674310.0, ans=0.125 2024-08-12 14:13:26,781 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 29 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 14:13:28,936 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.20 vs. limit=22.5 2024-08-12 14:13:46,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1674410.0, ans=0.125 2024-08-12 14:13:48,869 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-08-12 14:13:54,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1674410.0, ans=0.125 2024-08-12 14:14:09,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1674510.0, ans=0.0 2024-08-12 14:14:14,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1674510.0, ans=0.0 2024-08-12 14:14:16,434 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8050, loss[loss=0.09865, beats_loss=0.01186, ecapa_loss=0.0001595, whisper_loss=0.08519, over 19330.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01097, ecapa_loss=0.0001744, whisper_loss=0.09259, over 3862637.13 frames. ], batch size: 79, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:14:18,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1674610.0, ans=0.07 2024-08-12 14:14:23,316 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.00 vs. limit=22.5 2024-08-12 14:14:58,266 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 14:15:33,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1675010.0, ans=0.125 2024-08-12 14:15:34,750 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 14:15:43,366 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 14:15:47,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1675010.0, ans=0.125 2024-08-12 14:15:51,344 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8100, loss[loss=0.09056, beats_loss=0.01239, ecapa_loss=0.0001849, whisper_loss=0.07632, over 21322.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01107, ecapa_loss=0.0001744, whisper_loss=0.09158, over 3877485.28 frames. ], batch size: 92, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:15:51,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1675110.0, ans=0.2 2024-08-12 14:15:54,549 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 14:15:54,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1675110.0, ans=0.125 2024-08-12 14:15:58,177 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 14:16:01,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1675110.0, ans=0.0 2024-08-12 14:16:01,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1675110.0, ans=0.0 2024-08-12 14:16:12,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.342e+01 2.574e+01 2.867e+01 4.166e+01, threshold=5.148e+01, percent-clipped=0.0 2024-08-12 14:16:55,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.73 vs. limit=15.0 2024-08-12 14:17:01,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1675510.0, ans=0.125 2024-08-12 14:17:03,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1675510.0, ans=0.125 2024-08-12 14:17:15,293 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 14:17:18,097 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8150, loss[loss=0.09815, beats_loss=0.01063, ecapa_loss=0.0001789, whisper_loss=0.08574, over 22267.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01105, ecapa_loss=0.0001746, whisper_loss=0.09117, over 3870460.38 frames. ], batch size: 89, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:17:52,893 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 14:18:07,236 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 14:18:12,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1675810.0, ans=0.0 2024-08-12 14:18:21,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1675910.0, ans=0.0 2024-08-12 14:18:27,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1675910.0, ans=0.2 2024-08-12 14:18:50,953 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8200, loss[loss=0.1096, beats_loss=0.007359, ecapa_loss=0.0002483, whisper_loss=0.09972, over 16851.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01097, ecapa_loss=0.0001761, whisper_loss=0.09096, over 3847954.34 frames. ], batch size: 69, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:19:00,986 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 14:19:02,165 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 15 from Vox, 54 fro AS 2024-08-12 14:19:12,793 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.595e+01 2.929e+01 3.219e+01 5.675e+01, threshold=5.858e+01, percent-clipped=2.0 2024-08-12 14:19:19,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1676210.0, ans=0.125 2024-08-12 14:19:24,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1676310.0, ans=0.125 2024-08-12 14:19:29,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1676310.0, ans=0.2 2024-08-12 14:20:15,627 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8250, loss[loss=0.1014, beats_loss=0.01057, ecapa_loss=0.0001956, whisper_loss=0.08887, over 13922.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01102, ecapa_loss=0.0001761, whisper_loss=0.09151, over 3843490.38 frames. ], batch size: 56, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:20:29,206 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 14:20:33,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1676710.0, ans=0.2 2024-08-12 14:20:37,370 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 14:20:37,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1676710.0, ans=0.125 2024-08-12 14:20:39,458 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 14:20:54,469 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2024-08-12 14:21:23,729 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 14:21:26,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1677010.0, ans=0.125 2024-08-12 14:21:29,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=12.0 2024-08-12 14:21:30,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1677010.0, ans=0.1 2024-08-12 14:21:42,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1677010.0, ans=0.2 2024-08-12 14:21:46,302 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8300, loss[loss=0.1154, beats_loss=0.007378, ecapa_loss=0.0002116, whisper_loss=0.1059, over 17385.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01097, ecapa_loss=0.0001743, whisper_loss=0.0922, over 3842415.68 frames. ], batch size: 70, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:21:48,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1677110.0, ans=0.125 2024-08-12 14:21:50,232 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-12 14:22:03,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1677210.0, ans=0.125 2024-08-12 14:22:06,285 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.461e+01 2.729e+01 3.210e+01 2.355e+02, threshold=5.459e+01, percent-clipped=3.0 2024-08-12 14:22:06,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1677210.0, ans=0.1 2024-08-12 14:22:24,069 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 13 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 14:22:24,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=1677310.0, ans=0.2 2024-08-12 14:22:35,616 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 28 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 14:22:57,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1677510.0, ans=0.125 2024-08-12 14:23:00,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1677510.0, ans=0.1 2024-08-12 14:23:03,190 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 18 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 14:23:12,301 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8350, loss[loss=0.08791, beats_loss=0.01332, ecapa_loss=0.0001941, whisper_loss=0.07266, over 17055.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01105, ecapa_loss=0.0001751, whisper_loss=0.0923, over 3880379.96 frames. ], batch size: 73, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:23:20,814 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 14:23:22,613 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 14:23:32,517 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 14:23:32,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1677710.0, ans=0.0 2024-08-12 14:23:46,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1677810.0, ans=0.1 2024-08-12 14:24:13,016 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 14:24:15,060 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 14 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 14:24:26,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1678010.0, ans=0.0 2024-08-12 14:24:38,421 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8400, loss[loss=0.1037, beats_loss=0.01007, ecapa_loss=0.0002081, whisper_loss=0.09154, over 19004.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01102, ecapa_loss=0.0001759, whisper_loss=0.09251, over 3889279.28 frames. ], batch size: 78, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:24:43,463 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 14:24:56,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1678210.0, ans=0.125 2024-08-12 14:24:59,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.506e+01 2.766e+01 3.211e+01 4.644e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 14:25:08,152 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.862e-02 2024-08-12 14:25:10,197 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 14:25:11,326 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 14:25:24,043 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 14:26:01,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1678610.0, ans=0.0 2024-08-12 14:26:03,009 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8450, loss[loss=0.1101, beats_loss=0.01259, ecapa_loss=0.0001827, whisper_loss=0.09567, over 22473.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01094, ecapa_loss=0.0001771, whisper_loss=0.09249, over 3891808.12 frames. ], batch size: 88, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:26:12,979 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 14:26:20,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1678710.0, ans=0.1 2024-08-12 14:26:20,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1678710.0, ans=0.125 2024-08-12 14:26:33,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1678810.0, ans=0.0 2024-08-12 14:26:34,189 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.21 vs. limit=22.5 2024-08-12 14:26:49,292 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 14:26:51,256 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2024-08-12 14:26:52,626 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-12 14:27:05,587 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 22 from LS+wenet, 19 from Vox, 54 fro AS 2024-08-12 14:27:18,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1679010.0, ans=0.1 2024-08-12 14:27:20,238 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 14:27:24,378 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8500, loss[loss=0.1162, beats_loss=0.009607, ecapa_loss=0.000214, whisper_loss=0.1044, over 21097.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01096, ecapa_loss=0.0001777, whisper_loss=0.09266, over 3904648.72 frames. ], batch size: 89, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:27:24,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1679110.0, ans=0.125 2024-08-12 14:27:26,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1679110.0, ans=0.125 2024-08-12 14:27:27,576 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 18 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-12 14:27:41,113 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.02 vs. limit=10.0 2024-08-12 14:27:44,832 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.527e+01 2.828e+01 3.185e+01 5.995e+01, threshold=5.655e+01, percent-clipped=1.0 2024-08-12 14:27:54,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1679210.0, ans=0.0 2024-08-12 14:27:59,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1679310.0, ans=0.125 2024-08-12 14:28:03,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1679310.0, ans=0.09899494936611666 2024-08-12 14:28:05,477 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2024-08-12 14:28:15,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1679310.0, ans=0.1 2024-08-12 14:28:18,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1679410.0, ans=0.125 2024-08-12 14:28:23,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1679410.0, ans=0.0 2024-08-12 14:28:31,992 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 14:28:32,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1679410.0, ans=0.125 2024-08-12 14:28:55,227 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8550, loss[loss=0.09165, beats_loss=0.01532, ecapa_loss=0.0001175, whisper_loss=0.07516, over 22115.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01099, ecapa_loss=0.0001762, whisper_loss=0.09223, over 3914430.30 frames. ], batch size: 89, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:29:02,854 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 14:29:06,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1679610.0, ans=0.125 2024-08-12 14:29:20,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1679710.0, ans=0.125 2024-08-12 14:29:25,924 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.61 vs. limit=22.5 2024-08-12 14:29:34,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1679810.0, ans=0.0 2024-08-12 14:29:37,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1679810.0, ans=0.125 2024-08-12 14:30:23,743 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2024-08-12 14:30:25,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=12.0 2024-08-12 14:30:27,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1680010.0, ans=0.0 2024-08-12 14:30:27,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1680010.0, ans=0.0 2024-08-12 14:30:32,421 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8600, loss[loss=0.08052, beats_loss=0.008965, ecapa_loss=0.0001999, whisper_loss=0.06956, over 14114.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01095, ecapa_loss=0.0001761, whisper_loss=0.09244, over 3908886.40 frames. ], batch size: 54, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:30:36,305 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 14:30:40,396 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 14:30:52,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1680210.0, ans=0.125 2024-08-12 14:30:55,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.576e+01 2.836e+01 3.188e+01 4.951e+01, threshold=5.672e+01, percent-clipped=0.0 2024-08-12 14:31:13,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1680310.0, ans=0.0 2024-08-12 14:31:18,474 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 20 from LS+wenet, 30 from Vox, 44 fro AS 2024-08-12 14:31:21,393 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:31:31,082 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.84 vs. limit=15.0 2024-08-12 14:31:38,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.96 vs. limit=22.5 2024-08-12 14:31:39,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1680510.0, ans=0.125 2024-08-12 14:31:48,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1680510.0, ans=0.1 2024-08-12 14:31:54,947 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8650, loss[loss=0.1044, beats_loss=0.01188, ecapa_loss=0.0001958, whisper_loss=0.0906, over 18928.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01095, ecapa_loss=0.0001771, whisper_loss=0.09233, over 3906008.63 frames. ], batch size: 78, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:32:09,963 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-12 14:32:14,680 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.06 vs. limit=22.5 2024-08-12 14:32:17,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1680710.0, ans=0.2 2024-08-12 14:32:20,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-12 14:32:22,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1680810.0, ans=0.2 2024-08-12 14:32:26,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1680810.0, ans=0.0 2024-08-12 14:32:29,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1680810.0, ans=0.2 2024-08-12 14:32:36,784 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 33 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 14:33:00,169 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 14:33:05,920 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 16 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 14:33:07,976 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8700, loss[loss=0.08109, beats_loss=0.0119, ecapa_loss=0.0001976, whisper_loss=0.06721, over 17158.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01091, ecapa_loss=0.0001766, whisper_loss=0.09276, over 3911222.38 frames. ], batch size: 72, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:33:25,842 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.512e+01 2.777e+01 3.126e+01 4.363e+01, threshold=5.553e+01, percent-clipped=0.0 2024-08-12 14:33:27,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1681210.0, ans=0.125 2024-08-12 14:33:33,643 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 14:33:39,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1681310.0, ans=10.0 2024-08-12 14:34:19,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1681510.0, ans=0.07 2024-08-12 14:34:21,456 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8750, loss[loss=0.09269, beats_loss=0.01083, ecapa_loss=0.0001654, whisper_loss=0.08021, over 19035.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01088, ecapa_loss=0.0001764, whisper_loss=0.09248, over 3898177.87 frames. ], batch size: 78, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:34:40,627 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 14:34:45,137 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:34:46,136 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 14:34:46,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1681710.0, ans=0.0 2024-08-12 14:34:50,111 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 14:34:53,944 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2024-08-12 14:34:56,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1681810.0, ans=0.125 2024-08-12 14:35:05,884 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2024-08-12 14:35:20,118 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:35:27,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1682010.0, ans=0.1 2024-08-12 14:35:33,930 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8800, loss[loss=0.1056, beats_loss=0.01354, ecapa_loss=0.0001588, whisper_loss=0.09045, over 14893.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01105, ecapa_loss=0.0001754, whisper_loss=0.09193, over 3899438.57 frames. ], batch size: 57, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:35:42,684 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-12 14:35:44,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1682110.0, ans=0.015 2024-08-12 14:35:53,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.570e+01 2.828e+01 3.387e+01 1.190e+02, threshold=5.656e+01, percent-clipped=1.0 2024-08-12 14:35:54,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1682210.0, ans=0.07 2024-08-12 14:36:06,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1682310.0, ans=0.2 2024-08-12 14:36:25,476 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.76 vs. limit=12.0 2024-08-12 14:36:26,274 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 14:36:31,172 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-12 14:36:39,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1682510.0, ans=0.125 2024-08-12 14:36:56,299 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8850, loss[loss=0.07077, beats_loss=0.01297, ecapa_loss=0.0001565, whisper_loss=0.05624, over 13434.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01113, ecapa_loss=0.000175, whisper_loss=0.09086, over 3856197.73 frames. ], batch size: 55, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:37:01,347 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 14:37:18,559 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 14:37:28,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1682810.0, ans=0.0 2024-08-12 14:38:01,702 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 14:38:16,510 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8900, loss[loss=0.07002, beats_loss=0.01386, ecapa_loss=0.0001579, whisper_loss=0.05458, over 18471.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01111, ecapa_loss=0.0001756, whisper_loss=0.09103, over 3848889.26 frames. ], batch size: 75, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:38:24,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1683110.0, ans=0.125 2024-08-12 14:38:30,086 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 14:38:37,384 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.454e+01 2.719e+01 3.172e+01 4.928e+01, threshold=5.438e+01, percent-clipped=0.0 2024-08-12 14:38:37,605 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 14:38:39,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1683210.0, ans=0.125 2024-08-12 14:38:45,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1683210.0, ans=0.125 2024-08-12 14:39:04,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1683310.0, ans=0.0 2024-08-12 14:39:22,576 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 14:39:29,597 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 14:39:31,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1683510.0, ans=0.0 2024-08-12 14:39:38,678 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 8950, loss[loss=0.1033, beats_loss=0.01148, ecapa_loss=0.0001549, whisper_loss=0.09024, over 23694.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01105, ecapa_loss=0.0001753, whisper_loss=0.09131, over 3857683.19 frames. ], batch size: 94, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:39:52,053 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 14:39:59,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1683710.0, ans=0.5 2024-08-12 14:40:08,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1683710.0, ans=0.125 2024-08-12 14:40:20,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1683810.0, ans=0.2 2024-08-12 14:40:49,022 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 14:40:59,040 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9000, loss[loss=0.09488, beats_loss=0.01199, ecapa_loss=0.000189, whisper_loss=0.081, over 20723.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01109, ecapa_loss=0.0001749, whisper_loss=0.09131, over 3854229.93 frames. ], batch size: 84, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:40:59,041 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 14:41:38,405 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on ASR_libri: loss=0.2545, beats_loss=0, ecapa_loss=0.000585, whisper_loss=0.2487, over 922467.00 frames. 2024-08-12 14:41:57,340 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on SV_voxceleb1: loss=0.004785, beats_loss=0, ecapa_loss=0.0004785, whisper_loss=0, over 939242.00 frames. 2024-08-12 14:43:56,575 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on AT_audioset: loss=0.02422, beats_loss=0.02422, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 14:43:56,579 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 14:44:09,055 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 25 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 14:44:12,537 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 14:44:15,054 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.474e+01 2.766e+01 3.028e+01 3.985e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 14:44:18,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1684210.0, ans=0.0 2024-08-12 14:44:27,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1684310.0, ans=0.125 2024-08-12 14:44:36,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1684310.0, ans=0.125 2024-08-12 14:44:42,258 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 15 from LS+wenet, 34 from Vox, 21 fro AS 2024-08-12 14:44:57,073 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 33 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 14:44:59,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1684510.0, ans=0.0 2024-08-12 14:45:09,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1684510.0, ans=0.1 2024-08-12 14:45:15,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9050, loss[loss=0.1085, beats_loss=0.01078, ecapa_loss=0.0001683, whisper_loss=0.096, over 19608.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01104, ecapa_loss=0.0001759, whisper_loss=0.09146, over 3854498.73 frames. ], batch size: 78, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:45:27,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1684610.0, ans=0.07 2024-08-12 14:45:34,814 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-12 14:45:37,979 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 14:45:58,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1684810.0, ans=0.125 2024-08-12 14:46:14,024 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 14:46:17,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1684910.0, ans=0.2 2024-08-12 14:46:23,161 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 14:46:24,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1685010.0, ans=0.1 2024-08-12 14:46:27,354 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-12 14:46:30,720 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 14:46:35,110 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9100, loss[loss=0.1033, beats_loss=0.01394, ecapa_loss=0.0001391, whisper_loss=0.08799, over 14899.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01106, ecapa_loss=0.0001767, whisper_loss=0.09132, over 3847849.87 frames. ], batch size: 57, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:46:40,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1685110.0, ans=0.1 2024-08-12 14:46:42,243 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-12 14:46:52,766 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.564e+01 2.836e+01 3.271e+01 5.149e+01, threshold=5.673e+01, percent-clipped=0.0 2024-08-12 14:47:12,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1685310.0, ans=0.125 2024-08-12 14:47:15,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1685310.0, ans=0.1 2024-08-12 14:47:22,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1685410.0, ans=0.035 2024-08-12 14:47:26,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1685410.0, ans=0.125 2024-08-12 14:47:26,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1685410.0, ans=0.025 2024-08-12 14:47:44,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1685510.0, ans=0.125 2024-08-12 14:47:51,594 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9150, loss[loss=0.1098, beats_loss=0.01273, ecapa_loss=0.0001588, whisper_loss=0.09551, over 22335.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01099, ecapa_loss=0.0001769, whisper_loss=0.09207, over 3880427.00 frames. ], batch size: 89, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:48:11,207 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 18 from Vox, 53 fro AS 2024-08-12 14:48:17,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1685710.0, ans=0.125 2024-08-12 14:48:32,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1685810.0, ans=0.125 2024-08-12 14:48:44,576 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 14:48:46,792 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-08-12 14:49:03,138 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 14:49:06,701 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9200, loss[loss=0.1065, beats_loss=0.01251, ecapa_loss=0.0001846, whisper_loss=0.09219, over 21088.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01111, ecapa_loss=0.0001767, whisper_loss=0.09145, over 3922894.64 frames. ], batch size: 86, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:49:08,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1686110.0, ans=0.125 2024-08-12 14:49:23,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.540e+01 2.969e+01 3.284e+01 5.041e+01, threshold=5.938e+01, percent-clipped=0.0 2024-08-12 14:49:46,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1686310.0, ans=0.125 2024-08-12 14:49:49,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1686310.0, ans=0.2 2024-08-12 14:50:24,956 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9250, loss[loss=0.1134, beats_loss=0.009153, ecapa_loss=0.0001677, whisper_loss=0.1026, over 23047.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01113, ecapa_loss=0.0001758, whisper_loss=0.09101, over 3899814.31 frames. ], batch size: 92, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:50:39,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1686710.0, ans=0.0 2024-08-12 14:50:47,470 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 14:51:13,022 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 14:51:24,365 INFO [train_multi_KD3.py:844] (1/4) A total of 97 cuts. 32 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 14:51:30,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1686910.0, ans=0.125 2024-08-12 14:51:37,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1687010.0, ans=0.125 2024-08-12 14:51:43,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1687010.0, ans=0.125 2024-08-12 14:51:49,225 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9300, loss[loss=0.1212, beats_loss=0.01208, ecapa_loss=0.0001849, whisper_loss=0.1073, over 23344.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01118, ecapa_loss=0.000175, whisper_loss=0.09124, over 3919066.19 frames. ], batch size: 91, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:51:56,469 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 15 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 14:52:02,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1687210.0, ans=0.0 2024-08-12 14:52:09,967 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.511e+01 2.773e+01 3.215e+01 9.080e+01, threshold=5.546e+01, percent-clipped=1.0 2024-08-12 14:52:37,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1687310.0, ans=0.125 2024-08-12 14:52:49,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1687410.0, ans=0.0 2024-08-12 14:52:51,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1687410.0, ans=0.125 2024-08-12 14:52:56,963 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 14:53:00,779 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 14:53:04,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1687510.0, ans=0.2 2024-08-12 14:53:14,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9350, loss[loss=0.1076, beats_loss=0.01099, ecapa_loss=0.000164, whisper_loss=0.09498, over 15881.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01124, ecapa_loss=0.0001745, whisper_loss=0.0911, over 3906850.96 frames. ], batch size: 64, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:53:16,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1687610.0, ans=0.05 2024-08-12 14:53:22,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1687610.0, ans=0.125 2024-08-12 14:54:52,094 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9400, loss[loss=0.08216, beats_loss=0.01407, ecapa_loss=0.0001311, whisper_loss=0.06678, over 20402.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01121, ecapa_loss=0.0001751, whisper_loss=0.0907, over 3923723.78 frames. ], batch size: 83, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:55:13,664 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 14:55:16,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1688210.0, ans=0.1 2024-08-12 14:55:18,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.357e+01 2.577e+01 2.940e+01 4.355e+01, threshold=5.154e+01, percent-clipped=0.0 2024-08-12 14:55:37,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1688310.0, ans=0.1 2024-08-12 14:55:37,664 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=12.0 2024-08-12 14:55:38,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1688310.0, ans=0.125 2024-08-12 14:55:49,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1688410.0, ans=0.2 2024-08-12 14:55:51,324 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.55 vs. limit=22.5 2024-08-12 14:55:58,703 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.97 vs. limit=15.0 2024-08-12 14:56:15,749 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 14:56:22,346 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.14 vs. limit=15.0 2024-08-12 14:56:29,033 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9450, loss[loss=0.1117, beats_loss=0.009331, ecapa_loss=0.0002056, whisper_loss=0.1003, over 15180.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01112, ecapa_loss=0.0001764, whisper_loss=0.0909, over 3917300.06 frames. ], batch size: 61, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:56:29,249 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 14:56:32,553 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 14:56:56,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2024-08-12 14:56:59,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1688710.0, ans=0.125 2024-08-12 14:57:06,039 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 14:57:20,014 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.563e-03 2024-08-12 14:57:26,175 WARNING [optim.py:496] (1/4) Scaling gradients by 0.09947884827852249, model_norm_threshold=51.535552978515625 2024-08-12 14:57:26,345 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.99, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.656e+05, grad_sumsq=2.952e+04, orig_rms_sq=8.999e+00 2024-08-12 14:57:46,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1689010.0, ans=0.2 2024-08-12 14:58:02,631 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9500, loss[loss=0.0924, beats_loss=0.0119, ecapa_loss=0.0001936, whisper_loss=0.07856, over 19502.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01112, ecapa_loss=0.0001762, whisper_loss=0.0904, over 3946764.61 frames. ], batch size: 78, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:58:05,040 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 14:58:05,780 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=22.5 2024-08-12 14:58:08,462 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 14:58:10,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1689110.0, ans=0.125 2024-08-12 14:58:10,793 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-12 14:58:16,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1689110.0, ans=0.125 2024-08-12 14:58:25,252 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 2.537e+01 2.807e+01 3.213e+01 5.181e+02, threshold=5.615e+01, percent-clipped=1.0 2024-08-12 14:58:27,001 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 14:58:42,712 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 14:58:51,824 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 14:58:58,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.31 vs. limit=15.0 2024-08-12 14:59:03,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1689410.0, ans=0.125 2024-08-12 14:59:06,760 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.61 vs. limit=6.0 2024-08-12 14:59:08,741 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2024-08-12 14:59:13,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1689510.0, ans=0.2 2024-08-12 14:59:21,868 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:59:24,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1689510.0, ans=0.125 2024-08-12 14:59:26,870 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9550, loss[loss=0.08778, beats_loss=0.01166, ecapa_loss=0.0002112, whisper_loss=0.07401, over 18840.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01102, ecapa_loss=0.0001764, whisper_loss=0.09075, over 3908646.97 frames. ], batch size: 80, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:59:38,342 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.199e+00 2024-08-12 14:59:49,575 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 14:59:49,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1689710.0, ans=0.0 2024-08-12 15:00:01,551 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 15:00:23,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1690010.0, ans=0.09899494936611666 2024-08-12 15:00:24,886 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-12 15:00:29,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1690010.0, ans=0.04949747468305833 2024-08-12 15:00:35,970 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9600, loss[loss=0.08817, beats_loss=0.01191, ecapa_loss=0.0002286, whisper_loss=0.07397, over 15255.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.011, ecapa_loss=0.0001766, whisper_loss=0.09062, over 3891106.14 frames. ], batch size: 64, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:00:49,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1690210.0, ans=0.5 2024-08-12 15:00:49,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1690210.0, ans=0.125 2024-08-12 15:00:51,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1690210.0, ans=0.125 2024-08-12 15:00:51,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1690210.0, ans=0.1 2024-08-12 15:00:52,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1690210.0, ans=0.0 2024-08-12 15:00:53,346 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.591e+01 2.857e+01 3.252e+01 5.691e+01, threshold=5.714e+01, percent-clipped=2.0 2024-08-12 15:00:54,325 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2024-08-12 15:00:55,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1690210.0, ans=0.1 2024-08-12 15:01:00,479 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-12 15:01:05,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1690310.0, ans=0.1 2024-08-12 15:01:38,707 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 15:01:41,626 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 15:01:44,263 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9650, loss[loss=0.08825, beats_loss=0.01311, ecapa_loss=0.0001785, whisper_loss=0.07336, over 22046.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01093, ecapa_loss=0.0001776, whisper_loss=0.09157, over 3888547.66 frames. ], batch size: 92, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:01:51,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1690610.0, ans=0.125 2024-08-12 15:01:52,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1690610.0, ans=0.125 2024-08-12 15:02:01,109 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 15:02:17,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1690810.0, ans=0.04949747468305833 2024-08-12 15:02:20,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.75 vs. limit=22.5 2024-08-12 15:02:23,888 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-12 15:02:37,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1691010.0, ans=0.125 2024-08-12 15:02:43,621 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2024-08-12 15:02:45,761 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 15:02:53,005 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9700, loss[loss=0.1055, beats_loss=0.0109, ecapa_loss=0.0001461, whisper_loss=0.09315, over 19631.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01097, ecapa_loss=0.0001779, whisper_loss=0.09125, over 3900683.24 frames. ], batch size: 76, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:02:54,746 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 15:03:10,782 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.535e+01 2.821e+01 3.429e+01 6.519e+01, threshold=5.641e+01, percent-clipped=1.0 2024-08-12 15:03:27,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1691310.0, ans=0.2 2024-08-12 15:03:34,385 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-12 15:03:37,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1691410.0, ans=0.0 2024-08-12 15:03:37,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=12.0 2024-08-12 15:03:43,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1691410.0, ans=0.125 2024-08-12 15:03:44,999 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 15:04:04,255 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9750, loss[loss=0.08575, beats_loss=0.01142, ecapa_loss=0.0001892, whisper_loss=0.07243, over 17330.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01092, ecapa_loss=0.0001766, whisper_loss=0.09162, over 3854850.81 frames. ], batch size: 71, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:04:08,055 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-12 15:04:18,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1691710.0, ans=0.1 2024-08-12 15:04:22,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1691710.0, ans=0.125 2024-08-12 15:04:34,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1691810.0, ans=0.0 2024-08-12 15:04:40,378 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.49 vs. limit=22.5 2024-08-12 15:05:01,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1692010.0, ans=0.125 2024-08-12 15:05:03,767 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.07 vs. limit=15.0 2024-08-12 15:05:12,124 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2024-08-12 15:05:12,534 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9800, loss[loss=0.1105, beats_loss=0.01134, ecapa_loss=0.0002098, whisper_loss=0.09703, over 18335.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01099, ecapa_loss=0.0001774, whisper_loss=0.09096, over 3871077.73 frames. ], batch size: 76, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:05:16,720 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 15:05:30,087 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.561e+01 2.818e+01 3.285e+01 1.389e+02, threshold=5.636e+01, percent-clipped=4.0 2024-08-12 15:05:46,966 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 15:06:04,507 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 15:06:04,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1692510.0, ans=0.0 2024-08-12 15:06:19,291 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9850, loss[loss=0.1005, beats_loss=0.01284, ecapa_loss=0.0001511, whisper_loss=0.08614, over 23147.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01105, ecapa_loss=0.0001766, whisper_loss=0.09126, over 3860395.71 frames. ], batch size: 94, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:06:22,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1692610.0, ans=0.125 2024-08-12 15:06:36,439 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 15:06:36,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1692710.0, ans=0.1 2024-08-12 15:06:37,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1692710.0, ans=0.2 2024-08-12 15:06:58,688 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 15:07:18,075 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 24 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-12 15:07:25,896 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 15:07:28,314 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9900, loss[loss=0.09626, beats_loss=0.0112, ecapa_loss=0.0001468, whisper_loss=0.08359, over 17407.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01102, ecapa_loss=0.0001762, whisper_loss=0.0922, over 3870600.98 frames. ], batch size: 68, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:07:42,818 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 15:07:46,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.533e+01 2.789e+01 3.190e+01 6.872e+01, threshold=5.578e+01, percent-clipped=1.0 2024-08-12 15:07:54,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1693210.0, ans=0.125 2024-08-12 15:08:08,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1693310.0, ans=0.125 2024-08-12 15:08:26,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1693510.0, ans=0.09899494936611666 2024-08-12 15:08:38,413 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 9950, loss[loss=0.09723, beats_loss=0.0112, ecapa_loss=0.0001752, whisper_loss=0.08428, over 20385.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01097, ecapa_loss=0.0001778, whisper_loss=0.0921, over 3878214.27 frames. ], batch size: 83, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:08:40,466 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=12.0 2024-08-12 15:08:50,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1693610.0, ans=0.0 2024-08-12 15:08:57,233 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 15:09:01,648 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 15:09:04,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1693710.0, ans=0.0 2024-08-12 15:09:13,054 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 15:09:22,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1693910.0, ans=0.125 2024-08-12 15:09:32,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1693910.0, ans=0.07 2024-08-12 15:09:45,172 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-12 15:09:46,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1694010.0, ans=0.125 2024-08-12 15:09:51,528 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 15:09:54,008 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10000, loss[loss=0.0826, beats_loss=0.01226, ecapa_loss=0.0001357, whisper_loss=0.06898, over 16214.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.0001779, whisper_loss=0.09227, over 3877282.44 frames. ], batch size: 60, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:10:11,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.583e+01 2.831e+01 3.339e+01 3.966e+02, threshold=5.663e+01, percent-clipped=2.0 2024-08-12 15:10:36,341 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 15:10:50,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1694510.0, ans=0.125 2024-08-12 15:11:01,728 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10050, loss[loss=0.1306, beats_loss=0.009184, ecapa_loss=0.0001819, whisper_loss=0.1196, over 19797.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01095, ecapa_loss=0.0001771, whisper_loss=0.09205, over 3839927.03 frames. ], batch size: 77, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:11:07,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1694610.0, ans=0.0 2024-08-12 15:11:19,632 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2024-08-12 15:11:33,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1694810.0, ans=0.025 2024-08-12 15:11:43,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1694910.0, ans=0.0 2024-08-12 15:12:14,400 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10100, loss[loss=0.09747, beats_loss=0.01088, ecapa_loss=0.0001748, whisper_loss=0.08484, over 14592.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01097, ecapa_loss=0.0001767, whisper_loss=0.09196, over 3847209.61 frames. ], batch size: 58, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:12:21,130 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 24 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-12 15:12:27,063 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 15:12:33,727 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.463e+01 2.716e+01 3.042e+01 6.161e+01, threshold=5.433e+01, percent-clipped=3.0 2024-08-12 15:12:35,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1695210.0, ans=0.05 2024-08-12 15:13:02,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1695410.0, ans=0.1 2024-08-12 15:13:03,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2024-08-12 15:13:09,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1695410.0, ans=0.125 2024-08-12 15:13:10,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1695410.0, ans=0.0 2024-08-12 15:13:10,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1695410.0, ans=0.2 2024-08-12 15:13:13,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1695410.0, ans=0.125 2024-08-12 15:13:14,736 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2024-08-12 15:13:28,882 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10150, loss[loss=0.1104, beats_loss=0.008702, ecapa_loss=0.0001987, whisper_loss=0.09973, over 16290.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01098, ecapa_loss=0.000177, whisper_loss=0.09201, over 3882467.04 frames. ], batch size: 61, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:13:32,585 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.06 vs. limit=10.0 2024-08-12 15:13:35,060 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.66 vs. limit=10.0 2024-08-12 15:13:39,651 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 15:13:41,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1695710.0, ans=0.125 2024-08-12 15:13:58,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1695810.0, ans=0.1 2024-08-12 15:14:03,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1695810.0, ans=0.125 2024-08-12 15:14:04,634 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.769e+00 2024-08-12 15:14:08,637 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-12 15:14:11,202 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 15:14:16,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1695910.0, ans=0.125 2024-08-12 15:14:23,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1696010.0, ans=0.0 2024-08-12 15:14:31,203 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 32 from LS+wenet, 7 from Vox, 25 fro AS 2024-08-12 15:14:36,517 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10200, loss[loss=0.1171, beats_loss=0.0106, ecapa_loss=0.0001749, whisper_loss=0.1048, over 16866.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01104, ecapa_loss=0.0001771, whisper_loss=0.09168, over 3878282.23 frames. ], batch size: 65, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:14:49,212 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 15:14:54,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.514e+01 2.832e+01 3.281e+01 6.809e+01, threshold=5.664e+01, percent-clipped=1.0 2024-08-12 15:15:04,676 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.670e-03 2024-08-12 15:15:08,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1696310.0, ans=0.125 2024-08-12 15:15:12,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1696310.0, ans=0.1 2024-08-12 15:15:29,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1696410.0, ans=0.125 2024-08-12 15:15:35,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1696510.0, ans=0.125 2024-08-12 15:15:36,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1696510.0, ans=0.0 2024-08-12 15:15:43,357 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 15:15:46,091 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10250, loss[loss=0.09937, beats_loss=0.01197, ecapa_loss=0.0001575, whisper_loss=0.08582, over 22865.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01103, ecapa_loss=0.0001775, whisper_loss=0.09138, over 3898261.00 frames. ], batch size: 91, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:15:48,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1696610.0, ans=0.0 2024-08-12 15:15:49,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1696610.0, ans=0.125 2024-08-12 15:15:51,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1696610.0, ans=0.2 2024-08-12 15:15:57,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1696610.0, ans=0.125 2024-08-12 15:16:20,686 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 15:16:26,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1696910.0, ans=0.0 2024-08-12 15:16:50,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1697010.0, ans=0.5 2024-08-12 15:16:54,442 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 15:16:57,065 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10300, loss[loss=0.08702, beats_loss=0.01238, ecapa_loss=0.0001485, whisper_loss=0.07315, over 18219.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01101, ecapa_loss=0.0001772, whisper_loss=0.09138, over 3892043.96 frames. ], batch size: 72, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:17:06,367 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 15:17:16,467 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.570e+01 2.801e+01 3.230e+01 4.716e+01, threshold=5.603e+01, percent-clipped=0.0 2024-08-12 15:17:16,690 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 15:17:52,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1697410.0, ans=0.2 2024-08-12 15:18:02,671 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 14 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 15:18:02,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1697510.0, ans=0.0 2024-08-12 15:18:09,641 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10350, loss[loss=0.1026, beats_loss=0.01116, ecapa_loss=0.0001948, whisper_loss=0.08953, over 21866.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01104, ecapa_loss=0.0001765, whisper_loss=0.09131, over 3906548.10 frames. ], batch size: 92, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:18:27,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1697710.0, ans=0.125 2024-08-12 15:18:50,473 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-12 15:18:51,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1697910.0, ans=0.0 2024-08-12 15:19:17,183 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10400, loss[loss=0.1205, beats_loss=0.008898, ecapa_loss=0.0002211, whisper_loss=0.1094, over 21594.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01103, ecapa_loss=0.0001761, whisper_loss=0.09107, over 3893385.64 frames. ], batch size: 88, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:19:17,313 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 15:19:35,329 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.431e+01 2.766e+01 3.090e+01 4.882e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 15:19:58,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1698410.0, ans=0.0 2024-08-12 15:20:24,516 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10450, loss[loss=0.07481, beats_loss=0.01569, ecapa_loss=0.0001502, whisper_loss=0.05762, over 20118.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01112, ecapa_loss=0.0001752, whisper_loss=0.09047, over 3861715.86 frames. ], batch size: 86, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:20:36,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1698610.0, ans=0.07 2024-08-12 15:20:38,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1698710.0, ans=0.0 2024-08-12 15:20:49,676 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 25 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-12 15:21:21,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1699010.0, ans=0.5 2024-08-12 15:21:21,398 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-12 15:21:25,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1699010.0, ans=0.125 2024-08-12 15:21:32,303 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2024-08-12 15:21:32,865 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10500, loss[loss=0.1023, beats_loss=0.01241, ecapa_loss=0.0001593, whisper_loss=0.08833, over 21795.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01109, ecapa_loss=0.0001747, whisper_loss=0.09093, over 3850344.52 frames. ], batch size: 88, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:21:36,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1699110.0, ans=0.125 2024-08-12 15:21:36,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1699110.0, ans=0.0 2024-08-12 15:21:42,663 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 15:21:44,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1699110.0, ans=0.5 2024-08-12 15:21:50,584 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.539e+01 2.734e+01 3.108e+01 4.878e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-12 15:21:52,882 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2024-08-12 15:22:00,234 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-12 15:22:06,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1699310.0, ans=0.125 2024-08-12 15:22:26,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1699510.0, ans=0.0 2024-08-12 15:22:35,447 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2024-08-12 15:22:37,869 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 27 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 15:22:40,598 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10550, loss[loss=0.105, beats_loss=0.01165, ecapa_loss=0.000171, whisper_loss=0.09159, over 21462.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.011, ecapa_loss=0.0001743, whisper_loss=0.09165, over 3852995.00 frames. ], batch size: 84, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:22:43,855 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 32 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 15:23:04,002 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.40 vs. limit=6.0 2024-08-12 15:23:16,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1699810.0, ans=0.1 2024-08-12 15:23:23,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1699910.0, ans=0.0 2024-08-12 15:23:26,800 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 19 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-12 15:23:29,607 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-12 15:23:37,111 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 15:23:44,220 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 15:23:47,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1700010.0, ans=0.05 2024-08-12 15:23:51,690 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 15:23:52,961 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10600, loss[loss=0.09734, beats_loss=0.01045, ecapa_loss=0.0001869, whisper_loss=0.08502, over 14942.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01104, ecapa_loss=0.0001741, whisper_loss=0.09055, over 3864102.67 frames. ], batch size: 61, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:23:56,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1700110.0, ans=0.125 2024-08-12 15:23:57,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1700110.0, ans=0.1 2024-08-12 15:24:07,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1700210.0, ans=0.125 2024-08-12 15:24:13,289 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.487e+01 2.727e+01 3.054e+01 5.238e+01, threshold=5.453e+01, percent-clipped=0.0 2024-08-12 15:24:17,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1700210.0, ans=0.125 2024-08-12 15:24:20,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1700210.0, ans=0.125 2024-08-12 15:24:27,130 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 23 from LS+wenet, 16 from Vox, 16 fro AS 2024-08-12 15:24:28,507 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 15:24:38,501 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.32 vs. limit=22.5 2024-08-12 15:24:49,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1700410.0, ans=0.125 2024-08-12 15:24:49,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1700410.0, ans=0.125 2024-08-12 15:25:07,465 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10650, loss[loss=0.1197, beats_loss=0.01052, ecapa_loss=0.0001824, whisper_loss=0.1073, over 23132.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01094, ecapa_loss=0.0001733, whisper_loss=0.09149, over 3868896.88 frames. ], batch size: 92, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:25:08,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1700610.0, ans=0.0 2024-08-12 15:25:15,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1700610.0, ans=0.125 2024-08-12 15:25:25,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1700710.0, ans=0.125 2024-08-12 15:25:29,544 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 15:25:37,685 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 15:25:39,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1700810.0, ans=0.0 2024-08-12 15:25:56,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1700910.0, ans=0.125 2024-08-12 15:25:59,658 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-12 15:26:05,981 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 15:26:13,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1701010.0, ans=0.1 2024-08-12 15:26:20,348 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10700, loss[loss=0.1091, beats_loss=0.0109, ecapa_loss=0.000157, whisper_loss=0.09664, over 14951.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01091, ecapa_loss=0.0001731, whisper_loss=0.09257, over 3893741.95 frames. ], batch size: 57, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:26:21,893 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 15:26:31,594 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-12 15:26:34,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1701210.0, ans=0.0 2024-08-12 15:26:39,604 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.524e+01 2.760e+01 3.145e+01 5.039e+01, threshold=5.520e+01, percent-clipped=0.0 2024-08-12 15:26:42,420 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 15:26:46,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1701310.0, ans=0.2 2024-08-12 15:26:57,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1701310.0, ans=0.0 2024-08-12 15:27:00,281 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 15:27:00,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1701410.0, ans=0.125 2024-08-12 15:27:04,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1701410.0, ans=0.0 2024-08-12 15:27:13,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1701510.0, ans=0.0 2024-08-12 15:27:20,234 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 15:27:23,690 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=22.5 2024-08-12 15:27:24,695 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2024-08-12 15:27:27,884 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10750, loss[loss=0.1227, beats_loss=0.008151, ecapa_loss=0.0002273, whisper_loss=0.1122, over 20991.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01093, ecapa_loss=0.0001742, whisper_loss=0.09251, over 3873840.15 frames. ], batch size: 90, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:27:30,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1701610.0, ans=0.035 2024-08-12 15:27:31,266 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.19 vs. limit=22.5 2024-08-12 15:27:44,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1701710.0, ans=0.125 2024-08-12 15:27:51,662 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2024-08-12 15:27:52,269 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 15:28:01,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1701810.0, ans=0.2 2024-08-12 15:28:17,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1701910.0, ans=10.0 2024-08-12 15:28:28,543 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 21 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 15:28:31,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1702010.0, ans=0.125 2024-08-12 15:28:34,226 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 15:28:35,251 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10800, loss[loss=0.1061, beats_loss=0.01059, ecapa_loss=0.0001762, whisper_loss=0.09375, over 20932.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001751, whisper_loss=0.09155, over 3863172.63 frames. ], batch size: 84, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:28:54,374 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.536e+01 2.905e+01 3.267e+01 1.637e+02, threshold=5.810e+01, percent-clipped=2.0 2024-08-12 15:28:59,760 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 15:29:14,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1702410.0, ans=0.125 2024-08-12 15:29:33,191 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 15:29:42,648 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10850, loss[loss=0.1069, beats_loss=0.01114, ecapa_loss=0.0001475, whisper_loss=0.09432, over 15835.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01104, ecapa_loss=0.000175, whisper_loss=0.09253, over 3876678.67 frames. ], batch size: 61, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:29:55,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1702710.0, ans=0.125 2024-08-12 15:30:00,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1702710.0, ans=0.0 2024-08-12 15:30:00,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1702710.0, ans=0.1 2024-08-12 15:30:06,721 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2024-08-12 15:30:23,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1702910.0, ans=0.125 2024-08-12 15:30:24,328 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 15:30:26,968 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 15:30:29,768 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 15:30:44,823 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 15:30:50,219 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10900, loss[loss=0.1087, beats_loss=0.01158, ecapa_loss=0.0001831, whisper_loss=0.09532, over 19409.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01102, ecapa_loss=0.0001742, whisper_loss=0.09254, over 3893331.51 frames. ], batch size: 79, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:30:53,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1703110.0, ans=0.0 2024-08-12 15:31:02,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1703210.0, ans=0.125 2024-08-12 15:31:06,981 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-12 15:31:08,861 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.493e+01 2.855e+01 3.171e+01 4.648e+01, threshold=5.710e+01, percent-clipped=0.0 2024-08-12 15:31:12,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2024-08-12 15:31:14,539 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-08-12 15:31:22,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1703310.0, ans=0.1 2024-08-12 15:31:28,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1703310.0, ans=0.0 2024-08-12 15:31:55,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1703510.0, ans=0.125 2024-08-12 15:31:57,166 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 10950, loss[loss=0.1282, beats_loss=0.00987, ecapa_loss=0.0001679, whisper_loss=0.1167, over 23883.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0111, ecapa_loss=0.0001736, whisper_loss=0.0923, over 3914406.18 frames. ], batch size: 93, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:31:57,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1703610.0, ans=0.1 2024-08-12 15:32:13,524 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 15:32:21,461 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-12 15:32:25,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1703810.0, ans=0.125 2024-08-12 15:32:33,040 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 15:32:55,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1703910.0, ans=0.125 2024-08-12 15:33:01,622 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-08-12 15:33:03,182 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-08-12 15:33:05,320 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 17 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 15:33:13,210 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11000, loss[loss=0.1046, beats_loss=0.01101, ecapa_loss=0.0001927, whisper_loss=0.09168, over 21829.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01105, ecapa_loss=0.0001738, whisper_loss=0.09215, over 3921459.39 frames. ], batch size: 92, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:33:16,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1704110.0, ans=10.0 2024-08-12 15:33:19,365 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 15:33:24,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1704110.0, ans=0.125 2024-08-12 15:33:32,672 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.453e+01 2.776e+01 3.261e+01 5.617e+01, threshold=5.552e+01, percent-clipped=0.0 2024-08-12 15:33:37,696 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2024-08-12 15:33:39,664 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 15:33:49,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1704310.0, ans=0.125 2024-08-12 15:33:57,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1704410.0, ans=0.0 2024-08-12 15:34:11,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1704510.0, ans=0.125 2024-08-12 15:34:21,532 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11050, loss[loss=0.0866, beats_loss=0.01249, ecapa_loss=0.0001612, whisper_loss=0.0725, over 19866.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01096, ecapa_loss=0.0001744, whisper_loss=0.09191, over 3901403.12 frames. ], batch size: 81, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:34:25,239 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-08-12 15:34:25,559 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 11 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 15:34:34,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1704710.0, ans=0.125 2024-08-12 15:34:35,689 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.03 vs. limit=10.0 2024-08-12 15:34:38,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1704710.0, ans=0.125 2024-08-12 15:34:45,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1704710.0, ans=0.0 2024-08-12 15:34:47,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1704810.0, ans=0.1 2024-08-12 15:34:47,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1704810.0, ans=0.0 2024-08-12 15:35:00,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1704910.0, ans=0.025 2024-08-12 15:35:04,328 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 15:35:04,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1704910.0, ans=0.0 2024-08-12 15:35:14,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1705010.0, ans=0.125 2024-08-12 15:35:18,117 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 15:35:18,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1705010.0, ans=0.125 2024-08-12 15:35:19,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1705010.0, ans=0.0 2024-08-12 15:35:29,218 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11100, loss[loss=0.08724, beats_loss=0.01262, ecapa_loss=0.0001632, whisper_loss=0.07299, over 14859.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01094, ecapa_loss=0.0001738, whisper_loss=0.09189, over 3890663.08 frames. ], batch size: 59, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:35:35,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1705110.0, ans=15.0 2024-08-12 15:35:45,263 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.09 vs. limit=22.5 2024-08-12 15:35:48,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.398e+01 2.655e+01 3.117e+01 6.342e+01, threshold=5.309e+01, percent-clipped=1.0 2024-08-12 15:35:54,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1705210.0, ans=0.0 2024-08-12 15:36:11,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-12 15:36:20,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1705410.0, ans=0.125 2024-08-12 15:36:27,579 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 15:36:37,354 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 15:36:38,749 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11150, loss[loss=0.1036, beats_loss=0.008913, ecapa_loss=0.0001617, whisper_loss=0.09305, over 16399.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01084, ecapa_loss=0.0001741, whisper_loss=0.0923, over 3887297.49 frames. ], batch size: 62, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:36:46,657 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 15:36:54,227 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.56 vs. limit=8.0 2024-08-12 15:36:58,737 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-12 15:36:58,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1705710.0, ans=0.125 2024-08-12 15:37:05,548 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-12 15:37:05,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1705810.0, ans=0.1 2024-08-12 15:37:07,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1705810.0, ans=0.0 2024-08-12 15:37:20,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1705910.0, ans=0.125 2024-08-12 15:37:28,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1705910.0, ans=0.125 2024-08-12 15:37:33,918 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-12 15:37:46,074 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11200, loss[loss=0.0954, beats_loss=0.01173, ecapa_loss=0.0001518, whisper_loss=0.08215, over 22231.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01083, ecapa_loss=0.0001741, whisper_loss=0.09238, over 3890643.30 frames. ], batch size: 88, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:37:48,851 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 15:38:03,883 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.85 vs. limit=15.0 2024-08-12 15:38:05,486 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.490e+01 2.836e+01 3.047e+01 5.086e+01, threshold=5.671e+01, percent-clipped=0.0 2024-08-12 15:38:08,106 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 35 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 15:38:32,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1706410.0, ans=0.0 2024-08-12 15:38:34,939 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 15:38:36,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1706410.0, ans=0.0 2024-08-12 15:38:42,921 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 36 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 15:38:53,758 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11250, loss[loss=0.1005, beats_loss=0.01252, ecapa_loss=0.0001561, whisper_loss=0.08642, over 19028.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01088, ecapa_loss=0.000174, whisper_loss=0.0924, over 3896733.54 frames. ], batch size: 75, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:39:11,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1706710.0, ans=0.125 2024-08-12 15:39:14,162 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 15:39:19,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1706810.0, ans=0.0 2024-08-12 15:39:28,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1706810.0, ans=0.1 2024-08-12 15:39:37,751 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-08-12 15:40:01,552 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11300, loss[loss=0.09478, beats_loss=0.01208, ecapa_loss=0.0002093, whisper_loss=0.0806, over 15465.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01076, ecapa_loss=0.0001744, whisper_loss=0.09266, over 3894549.19 frames. ], batch size: 64, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:40:14,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1707210.0, ans=0.0 2024-08-12 15:40:20,378 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.567e+01 2.768e+01 3.157e+01 8.223e+01, threshold=5.536e+01, percent-clipped=2.0 2024-08-12 15:40:40,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1707310.0, ans=0.125 2024-08-12 15:40:57,603 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-12 15:40:57,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1707510.0, ans=0.1 2024-08-12 15:40:58,894 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 15:41:05,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1707510.0, ans=0.0 2024-08-12 15:41:10,186 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11350, loss[loss=0.1005, beats_loss=0.008174, ecapa_loss=0.0001629, whisper_loss=0.09073, over 14292.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01077, ecapa_loss=0.0001737, whisper_loss=0.09259, over 3883533.99 frames. ], batch size: 54, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:41:37,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1707810.0, ans=0.95 2024-08-12 15:42:02,900 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 15:42:03,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1708010.0, ans=0.125 2024-08-12 15:42:07,958 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 15:42:17,883 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11400, loss[loss=0.09219, beats_loss=0.01123, ecapa_loss=0.000173, whisper_loss=0.07923, over 17372.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01078, ecapa_loss=0.0001737, whisper_loss=0.09258, over 3892139.13 frames. ], batch size: 71, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:42:26,974 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-12 15:42:36,470 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.714e+01 3.019e+01 3.288e+01 4.590e+01, threshold=6.038e+01, percent-clipped=0.0 2024-08-12 15:42:40,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1708210.0, ans=0.0 2024-08-12 15:42:45,142 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 15:43:08,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1708410.0, ans=0.5 2024-08-12 15:43:11,059 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 15:43:25,906 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11450, loss[loss=0.1005, beats_loss=0.01507, ecapa_loss=0.0001428, whisper_loss=0.08404, over 21349.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0109, ecapa_loss=0.0001715, whisper_loss=0.09295, over 3940007.58 frames. ], batch size: 84, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:43:26,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1708610.0, ans=0.1 2024-08-12 15:43:31,379 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 20 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-12 15:43:32,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1708610.0, ans=0.125 2024-08-12 15:43:45,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1708710.0, ans=0.09899494936611666 2024-08-12 15:43:51,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1708810.0, ans=0.125 2024-08-12 15:43:56,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1708810.0, ans=0.125 2024-08-12 15:44:08,601 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 15:44:11,972 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-08-12 15:44:12,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1708910.0, ans=0.1 2024-08-12 15:44:15,300 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 15:44:20,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1709010.0, ans=0.125 2024-08-12 15:44:21,030 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 15:44:31,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1709010.0, ans=0.0 2024-08-12 15:44:34,167 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11500, loss[loss=0.08726, beats_loss=0.01208, ecapa_loss=0.0001675, whisper_loss=0.07351, over 21487.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01098, ecapa_loss=0.000171, whisper_loss=0.09249, over 3957780.89 frames. ], batch size: 87, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:44:35,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2024-08-12 15:44:41,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1709110.0, ans=0.125 2024-08-12 15:44:50,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=16.69 vs. limit=15.0 2024-08-12 15:44:54,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.425e+01 2.764e+01 3.070e+01 5.781e+01, threshold=5.529e+01, percent-clipped=0.0 2024-08-12 15:44:58,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1709210.0, ans=0.1 2024-08-12 15:45:03,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1709310.0, ans=0.2 2024-08-12 15:45:18,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1709410.0, ans=0.0 2024-08-12 15:45:20,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1709410.0, ans=0.0 2024-08-12 15:45:20,366 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2024-08-12 15:45:21,152 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 15:45:40,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1709510.0, ans=0.07 2024-08-12 15:45:42,882 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 15:45:47,349 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11550, loss[loss=0.09307, beats_loss=0.01264, ecapa_loss=0.0001565, whisper_loss=0.07886, over 21810.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01099, ecapa_loss=0.0001736, whisper_loss=0.09181, over 3946168.16 frames. ], batch size: 88, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:45:53,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1709610.0, ans=0.125 2024-08-12 15:46:12,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1709810.0, ans=0.1 2024-08-12 15:46:29,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1709910.0, ans=0.125 2024-08-12 15:46:41,456 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 15:46:47,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1710010.0, ans=0.125 2024-08-12 15:47:03,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1710110.0, ans=0.2 2024-08-12 15:47:04,041 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11600, loss[loss=0.0956, beats_loss=0.00948, ecapa_loss=0.0002333, whisper_loss=0.08378, over 18666.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01089, ecapa_loss=0.0001749, whisper_loss=0.09262, over 3983460.59 frames. ], batch size: 78, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:47:18,155 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2024-08-12 15:47:32,099 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.125e+01 2.592e+01 2.931e+01 3.257e+01 5.066e+01, threshold=5.862e+01, percent-clipped=0.0 2024-08-12 15:47:39,687 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 15:47:46,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.09 vs. limit=22.5 2024-08-12 15:48:47,200 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 15:48:51,745 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11650, loss[loss=0.09508, beats_loss=0.01196, ecapa_loss=0.0002341, whisper_loss=0.08078, over 20953.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0109, ecapa_loss=0.0001753, whisper_loss=0.0926, over 3994770.64 frames. ], batch size: 89, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:49:00,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1710610.0, ans=10.0 2024-08-12 15:49:05,878 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.29 vs. limit=10.0 2024-08-12 15:49:10,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1710610.0, ans=0.125 2024-08-12 15:49:49,190 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2024-08-12 15:49:56,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1710810.0, ans=0.125 2024-08-12 15:49:57,624 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 23 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-12 15:50:24,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1710910.0, ans=0.125 2024-08-12 15:50:29,043 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 15:50:32,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1710910.0, ans=0.0 2024-08-12 15:50:49,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1711010.0, ans=0.0 2024-08-12 15:50:53,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1711010.0, ans=0.125 2024-08-12 15:51:06,336 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11700, loss[loss=0.1217, beats_loss=0.01028, ecapa_loss=0.0001901, whisper_loss=0.1095, over 20945.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01092, ecapa_loss=0.0001757, whisper_loss=0.09268, over 3975754.55 frames. ], batch size: 83, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:51:08,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1711110.0, ans=0.0 2024-08-12 15:51:16,201 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.60 vs. limit=6.0 2024-08-12 15:51:21,218 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 15:51:45,603 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.679e+01 3.031e+01 3.384e+01 8.068e+01, threshold=6.063e+01, percent-clipped=1.0 2024-08-12 15:51:45,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1711210.0, ans=0.125 2024-08-12 15:52:05,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1711310.0, ans=0.125 2024-08-12 15:52:14,003 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 15:52:20,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1711310.0, ans=0.0 2024-08-12 15:52:22,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1711310.0, ans=0.125 2024-08-12 15:52:32,796 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-12 15:52:42,111 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 15:52:53,992 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.033e+02 2024-08-12 15:53:20,104 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11750, loss[loss=0.08807, beats_loss=0.01495, ecapa_loss=0.0001707, whisper_loss=0.07141, over 18363.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01096, ecapa_loss=0.0001754, whisper_loss=0.09256, over 3959228.26 frames. ], batch size: 74, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:53:21,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1711610.0, ans=0.04949747468305833 2024-08-12 15:53:30,396 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2024-08-12 15:53:52,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1711710.0, ans=0.2 2024-08-12 15:54:13,743 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-12 15:54:30,892 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 15:54:49,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1712010.0, ans=0.0 2024-08-12 15:54:52,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1712010.0, ans=0.2 2024-08-12 15:55:02,309 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11800, loss[loss=0.1116, beats_loss=0.01369, ecapa_loss=0.0001435, whisper_loss=0.09645, over 22558.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01099, ecapa_loss=0.0001747, whisper_loss=0.09289, over 3939790.93 frames. ], batch size: 89, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:55:30,170 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.421e+01 2.823e+01 3.255e+01 8.063e+01, threshold=5.645e+01, percent-clipped=1.0 2024-08-12 15:55:33,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1712210.0, ans=0.0 2024-08-12 15:55:47,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1712310.0, ans=0.125 2024-08-12 15:56:00,056 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-12 15:56:01,792 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.45 vs. limit=22.5 2024-08-12 15:56:15,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1712510.0, ans=0.1 2024-08-12 15:56:20,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1712510.0, ans=0.0 2024-08-12 15:56:28,255 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 15:56:31,275 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11850, loss[loss=0.1008, beats_loss=0.013, ecapa_loss=0.0001663, whisper_loss=0.08618, over 20958.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.0001744, whisper_loss=0.092, over 3916040.06 frames. ], batch size: 84, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:56:33,089 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 15:56:36,069 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 15:56:40,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1712610.0, ans=0.0 2024-08-12 15:56:50,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1712710.0, ans=0.0 2024-08-12 15:56:55,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1712710.0, ans=0.125 2024-08-12 15:56:58,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1712710.0, ans=0.1 2024-08-12 15:57:12,913 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2024-08-12 15:57:29,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1712910.0, ans=0.125 2024-08-12 15:57:35,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2024-08-12 15:57:38,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1713010.0, ans=0.0 2024-08-12 15:57:58,480 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11900, loss[loss=0.1116, beats_loss=0.008921, ecapa_loss=0.0001911, whisper_loss=0.1008, over 20210.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01104, ecapa_loss=0.0001741, whisper_loss=0.09186, over 3948212.17 frames. ], batch size: 79, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:58:06,630 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.300e-01 2024-08-12 15:58:15,254 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 15:58:16,977 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-12 15:58:23,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1713210.0, ans=0.0 2024-08-12 15:58:24,743 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.471e+01 2.746e+01 3.069e+01 1.141e+02, threshold=5.492e+01, percent-clipped=1.0 2024-08-12 15:58:41,864 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 15:58:47,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1713310.0, ans=0.125 2024-08-12 15:58:53,334 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 15:58:59,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1713410.0, ans=0.125 2024-08-12 15:59:08,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1713510.0, ans=0.125 2024-08-12 15:59:24,024 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 11950, loss[loss=0.1102, beats_loss=0.01036, ecapa_loss=0.0001761, whisper_loss=0.0981, over 19571.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01103, ecapa_loss=0.0001751, whisper_loss=0.09241, over 3934933.91 frames. ], batch size: 77, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:59:40,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1713710.0, ans=0.0 2024-08-12 15:59:53,239 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-12 16:00:05,968 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 16:00:08,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1713810.0, ans=0.015 2024-08-12 16:00:13,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1713810.0, ans=0.0 2024-08-12 16:00:21,216 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.26 vs. limit=15.0 2024-08-12 16:00:41,392 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 16:00:43,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2024-08-12 16:00:50,423 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12000, loss[loss=0.1023, beats_loss=0.01141, ecapa_loss=0.0001568, whisper_loss=0.08933, over 21854.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01095, ecapa_loss=0.0001745, whisper_loss=0.09288, over 3931661.96 frames. ], batch size: 89, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:00:50,424 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 16:01:32,505 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005955, whisper_loss=0.2482, over 922467.00 frames. 2024-08-12 16:01:39,102 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([9.9207e-04, 1.7601e-02, 1.0277e-02, 3.2970e+00, 6.1469e-03, 5.9674e-02, 5.6185e-02, 4.2570e-02], device='cuda:1') 2024-08-12 16:01:51,932 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on SV_voxceleb1: loss=0.004759, beats_loss=0, ecapa_loss=0.0004759, whisper_loss=0, over 939242.00 frames. 2024-08-12 16:02:25,753 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.2232, 1.6355, 1.7336, 1.6145], device='cuda:1') 2024-08-12 16:03:43,541 INFO [train_multi_KD3.py:1149] (1/4) Epoch 12, validation on AT_audioset: loss=0.02413, beats_loss=0.02413, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 16:03:43,544 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 16:03:43,654 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 16:03:45,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1714110.0, ans=0.0 2024-08-12 16:03:49,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1714110.0, ans=0.125 2024-08-12 16:03:51,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1714110.0, ans=0.1 2024-08-12 16:04:03,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1714210.0, ans=0.125 2024-08-12 16:04:05,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1714210.0, ans=0.125 2024-08-12 16:04:06,639 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.437e+01 2.734e+01 3.186e+01 7.564e+01, threshold=5.468e+01, percent-clipped=2.0 2024-08-12 16:04:29,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1714410.0, ans=0.0 2024-08-12 16:04:33,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1714410.0, ans=0.0 2024-08-12 16:04:37,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1714410.0, ans=0.0 2024-08-12 16:04:44,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1714510.0, ans=0.125 2024-08-12 16:04:48,026 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-12 16:04:53,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1714510.0, ans=0.0 2024-08-12 16:04:59,170 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12050, loss[loss=0.1027, beats_loss=0.009792, ecapa_loss=0.0001792, whisper_loss=0.09109, over 17162.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01099, ecapa_loss=0.0001745, whisper_loss=0.09241, over 3902707.70 frames. ], batch size: 67, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:05:05,110 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 16:05:10,466 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 23 from LS+wenet, 34 from Vox, 37 fro AS 2024-08-12 16:05:40,182 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 16:05:40,974 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2024-08-12 16:05:52,938 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 16:06:01,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1715010.0, ans=0.2 2024-08-12 16:06:05,713 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-08-12 16:06:15,843 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12100, loss[loss=0.1047, beats_loss=0.009754, ecapa_loss=0.0001764, whisper_loss=0.09319, over 22084.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01097, ecapa_loss=0.0001743, whisper_loss=0.09176, over 3853989.43 frames. ], batch size: 91, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:06:27,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1715110.0, ans=0.025 2024-08-12 16:06:38,377 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.371e+01 2.653e+01 2.949e+01 4.098e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-12 16:06:52,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1715310.0, ans=0.1 2024-08-12 16:06:57,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1715310.0, ans=0.1 2024-08-12 16:07:04,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1715410.0, ans=0.0 2024-08-12 16:07:27,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1715510.0, ans=0.05 2024-08-12 16:07:28,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1715510.0, ans=0.125 2024-08-12 16:07:36,899 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12150, loss[loss=0.1238, beats_loss=0.008482, ecapa_loss=0.0002217, whisper_loss=0.1131, over 21358.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01091, ecapa_loss=0.0001741, whisper_loss=0.09273, over 3878204.88 frames. ], batch size: 88, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:07:38,954 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 16:07:41,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1715610.0, ans=0.125 2024-08-12 16:08:29,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1715910.0, ans=0.125 2024-08-12 16:08:33,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1715910.0, ans=0.125 2024-08-12 16:08:51,805 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12200, loss[loss=0.08359, beats_loss=0.01187, ecapa_loss=0.0001423, whisper_loss=0.07029, over 17624.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01096, ecapa_loss=0.0001744, whisper_loss=0.09207, over 3881415.85 frames. ], batch size: 69, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:08:55,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1716110.0, ans=0.125 2024-08-12 16:09:08,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1716210.0, ans=0.0 2024-08-12 16:09:12,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1716210.0, ans=0.125 2024-08-12 16:09:12,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.85 vs. limit=10.0 2024-08-12 16:09:13,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.462e+01 2.887e+01 3.237e+01 1.771e+02, threshold=5.773e+01, percent-clipped=2.0 2024-08-12 16:09:28,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-08-12 16:09:35,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1716410.0, ans=0.0 2024-08-12 16:09:40,812 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=21.22 vs. limit=22.5 2024-08-12 16:09:41,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1716410.0, ans=0.0 2024-08-12 16:09:54,517 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=22.5 2024-08-12 16:09:55,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1716510.0, ans=0.125 2024-08-12 16:10:07,305 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12250, loss[loss=0.08508, beats_loss=0.01137, ecapa_loss=0.0001413, whisper_loss=0.07229, over 15597.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01085, ecapa_loss=0.0001758, whisper_loss=0.09262, over 3883305.68 frames. ], batch size: 60, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:10:14,549 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.33 vs. limit=22.5 2024-08-12 16:10:28,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1716710.0, ans=0.2 2024-08-12 16:10:29,809 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 16:10:39,260 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 16:10:42,655 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 16:10:44,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1716810.0, ans=0.2 2024-08-12 16:10:44,663 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-08-12 16:11:05,920 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 16:11:20,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1717010.0, ans=0.125 2024-08-12 16:11:26,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1717110.0, ans=0.1 2024-08-12 16:11:27,646 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12300, loss[loss=0.09491, beats_loss=0.01089, ecapa_loss=0.0001699, whisper_loss=0.08232, over 16831.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01089, ecapa_loss=0.0001749, whisper_loss=0.09244, over 3894583.03 frames. ], batch size: 64, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:11:27,841 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 16:11:28,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1717110.0, ans=0.0 2024-08-12 16:11:52,106 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.615e+01 2.930e+01 3.275e+01 9.862e+01, threshold=5.860e+01, percent-clipped=1.0 2024-08-12 16:12:16,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1717410.0, ans=0.125 2024-08-12 16:12:28,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1717410.0, ans=0.125 2024-08-12 16:12:35,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1717510.0, ans=0.0 2024-08-12 16:12:51,308 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12350, loss[loss=0.09657, beats_loss=0.01155, ecapa_loss=0.0001904, whisper_loss=0.08312, over 14278.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01091, ecapa_loss=0.0001778, whisper_loss=0.09278, over 3908228.22 frames. ], batch size: 57, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:12:52,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1717610.0, ans=0.125 2024-08-12 16:12:53,554 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 16:13:12,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1717710.0, ans=0.0 2024-08-12 16:13:13,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1717710.0, ans=0.125 2024-08-12 16:13:18,427 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 16:13:46,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1717910.0, ans=0.125 2024-08-12 16:13:50,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1717910.0, ans=0.125 2024-08-12 16:14:04,571 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 16:14:11,674 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 16:14:14,242 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12400, loss[loss=0.06852, beats_loss=0.01475, ecapa_loss=0.0001485, whisper_loss=0.05229, over 14340.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01101, ecapa_loss=0.0001747, whisper_loss=0.0921, over 3909094.87 frames. ], batch size: 58, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:14:40,097 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.684e+01 3.067e+01 3.396e+01 5.308e+01, threshold=6.133e+01, percent-clipped=1.0 2024-08-12 16:15:08,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1718410.0, ans=0.125 2024-08-12 16:15:09,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1718410.0, ans=0.125 2024-08-12 16:15:36,918 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12450, loss[loss=0.08049, beats_loss=0.01276, ecapa_loss=0.0001839, whisper_loss=0.06589, over 13580.00 frames. ], tot_loss[loss=0.105, beats_loss=0.011, ecapa_loss=0.0001746, whisper_loss=0.09221, over 3897681.90 frames. ], batch size: 57, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:15:45,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1718610.0, ans=0.125 2024-08-12 16:15:45,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1718610.0, ans=0.125 2024-08-12 16:15:58,084 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.05 vs. limit=22.5 2024-08-12 16:16:00,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1718710.0, ans=0.1 2024-08-12 16:16:33,664 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-12 16:16:35,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1718910.0, ans=0.125 2024-08-12 16:16:56,333 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12500, loss[loss=0.1028, beats_loss=0.01236, ecapa_loss=0.0001799, whisper_loss=0.08865, over 19181.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.0001742, whisper_loss=0.09184, over 3903153.73 frames. ], batch size: 78, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:17:00,257 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 16:17:00,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1719110.0, ans=0.0 2024-08-12 16:17:08,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1719110.0, ans=0.125 2024-08-12 16:17:19,730 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.385e+01 2.736e+01 3.208e+01 9.127e+01, threshold=5.473e+01, percent-clipped=1.0 2024-08-12 16:17:21,862 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 16:17:39,292 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 16:17:42,997 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 16:18:00,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1719510.0, ans=0.125 2024-08-12 16:18:02,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1719510.0, ans=0.0 2024-08-12 16:18:10,484 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.842e-01 2024-08-12 16:18:12,292 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-12 16:18:16,534 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12550, loss[loss=0.1027, beats_loss=0.01324, ecapa_loss=0.0001501, whisper_loss=0.08797, over 19918.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01101, ecapa_loss=0.0001739, whisper_loss=0.09183, over 3918373.83 frames. ], batch size: 81, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:18:17,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1719610.0, ans=0.125 2024-08-12 16:18:45,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1719710.0, ans=0.125 2024-08-12 16:18:49,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1719810.0, ans=0.1 2024-08-12 16:19:13,171 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 16:19:22,540 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-12 16:19:32,420 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 16:19:35,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2024-08-12 16:19:38,880 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12600, loss[loss=0.1298, beats_loss=0.009808, ecapa_loss=0.0001928, whisper_loss=0.1181, over 23095.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.000174, whisper_loss=0.092, over 3903257.13 frames. ], batch size: 91, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:19:45,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1720110.0, ans=0.2 2024-08-12 16:20:01,511 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 16:20:03,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.583e+01 2.914e+01 3.404e+01 5.799e+01, threshold=5.828e+01, percent-clipped=1.0 2024-08-12 16:20:06,284 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 16:20:09,721 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 16:20:16,258 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-12 16:20:31,375 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 16:20:45,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1720510.0, ans=0.2 2024-08-12 16:20:58,938 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12650, loss[loss=0.09061, beats_loss=0.00989, ecapa_loss=0.0001888, whisper_loss=0.07883, over 16666.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01105, ecapa_loss=0.000175, whisper_loss=0.09184, over 3895051.26 frames. ], batch size: 66, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:21:02,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1720610.0, ans=0.125 2024-08-12 16:21:11,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1720610.0, ans=0.1 2024-08-12 16:21:37,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1720810.0, ans=0.2 2024-08-12 16:21:41,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1720810.0, ans=0.0 2024-08-12 16:21:43,998 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 16:21:50,735 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.67 vs. limit=10.0 2024-08-12 16:21:55,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1720910.0, ans=0.0 2024-08-12 16:21:57,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1720910.0, ans=0.2 2024-08-12 16:22:15,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1721110.0, ans=0.0 2024-08-12 16:22:16,299 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12700, loss[loss=0.1255, beats_loss=0.009149, ecapa_loss=0.0001839, whisper_loss=0.1145, over 23204.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01107, ecapa_loss=0.000176, whisper_loss=0.09189, over 3899573.92 frames. ], batch size: 91, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:22:17,239 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 16:22:23,052 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 15 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 16:22:24,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1721110.0, ans=0.2 2024-08-12 16:22:29,523 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2024-08-12 16:22:39,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1721210.0, ans=0.125 2024-08-12 16:22:40,114 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.411e+01 2.657e+01 2.975e+01 5.020e+01, threshold=5.313e+01, percent-clipped=0.0 2024-08-12 16:22:45,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1721210.0, ans=0.125 2024-08-12 16:22:52,168 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2024-08-12 16:22:53,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1721310.0, ans=0.0 2024-08-12 16:22:59,404 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 16:23:21,745 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 16 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 16:23:23,918 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=22.5 2024-08-12 16:23:34,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-12 16:23:35,778 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12750, loss[loss=0.09646, beats_loss=0.01015, ecapa_loss=0.0001988, whisper_loss=0.08432, over 17440.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01102, ecapa_loss=0.0001766, whisper_loss=0.09182, over 3873669.81 frames. ], batch size: 74, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:23:46,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1721610.0, ans=0.0 2024-08-12 16:23:52,442 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 16:23:56,176 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 16:24:07,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1721810.0, ans=0.2 2024-08-12 16:24:13,439 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 26 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 16:24:22,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1721910.0, ans=0.2 2024-08-12 16:24:50,809 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2024-08-12 16:24:56,827 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 16:24:57,967 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12800, loss[loss=0.08807, beats_loss=0.0129, ecapa_loss=0.0001942, whisper_loss=0.07323, over 19589.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01103, ecapa_loss=0.0001764, whisper_loss=0.09187, over 3884940.78 frames. ], batch size: 85, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:25:12,251 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 16:25:21,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.602e+01 2.886e+01 3.279e+01 7.661e+01, threshold=5.773e+01, percent-clipped=1.0 2024-08-12 16:25:33,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1722310.0, ans=0.125 2024-08-12 16:26:15,748 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 16:26:18,766 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12850, loss[loss=0.1031, beats_loss=0.01146, ecapa_loss=0.0001803, whisper_loss=0.08979, over 21338.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01102, ecapa_loss=0.000178, whisper_loss=0.09171, over 3890918.51 frames. ], batch size: 85, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:26:19,881 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-12 16:26:36,448 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 16:26:41,956 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.956e-02 2024-08-12 16:26:44,227 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 16:26:45,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1722710.0, ans=0.0 2024-08-12 16:26:47,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1722710.0, ans=0.125 2024-08-12 16:26:50,141 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2024-08-12 16:27:02,616 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 19 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 16:27:15,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1722910.0, ans=0.1 2024-08-12 16:27:29,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1723010.0, ans=0.2 2024-08-12 16:27:40,668 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12900, loss[loss=0.1036, beats_loss=0.01321, ecapa_loss=9.261e-05, whisper_loss=0.08949, over 15314.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01097, ecapa_loss=0.0001773, whisper_loss=0.09162, over 3853542.28 frames. ], batch size: 55, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:27:49,549 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-12 16:27:52,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1723110.0, ans=0.0 2024-08-12 16:27:54,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1723110.0, ans=0.125 2024-08-12 16:28:04,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1723210.0, ans=0.2 2024-08-12 16:28:05,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.447e+01 2.675e+01 2.950e+01 4.604e+01, threshold=5.350e+01, percent-clipped=0.0 2024-08-12 16:28:10,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.41 vs. limit=22.5 2024-08-12 16:28:19,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1723310.0, ans=0.95 2024-08-12 16:28:20,619 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 14 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 16:28:50,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1723510.0, ans=0.125 2024-08-12 16:29:03,578 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 12950, loss[loss=0.1087, beats_loss=0.009162, ecapa_loss=0.0001937, whisper_loss=0.0976, over 16705.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01102, ecapa_loss=0.0001773, whisper_loss=0.09084, over 3840775.83 frames. ], batch size: 65, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:29:27,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1723710.0, ans=0.125 2024-08-12 16:29:39,480 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2024-08-12 16:29:40,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1723810.0, ans=0.5 2024-08-12 16:29:47,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1723810.0, ans=0.125 2024-08-12 16:29:48,775 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 16:29:56,618 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 23 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 16:30:08,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1723910.0, ans=0.2 2024-08-12 16:30:18,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1724010.0, ans=0.125 2024-08-12 16:30:27,421 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-12 16:30:30,691 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13000, loss[loss=0.09121, beats_loss=0.01153, ecapa_loss=0.0002406, whisper_loss=0.07727, over 17031.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01094, ecapa_loss=0.0001774, whisper_loss=0.09153, over 3869012.85 frames. ], batch size: 73, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:30:45,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1724210.0, ans=0.125 2024-08-12 16:30:55,456 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.537e+01 2.771e+01 3.073e+01 6.149e+01, threshold=5.541e+01, percent-clipped=2.0 2024-08-12 16:30:57,496 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 16:30:59,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1724210.0, ans=0.125 2024-08-12 16:31:01,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1724210.0, ans=0.125 2024-08-12 16:31:05,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1724310.0, ans=0.2 2024-08-12 16:31:29,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1724410.0, ans=0.1 2024-08-12 16:31:34,364 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-12 16:31:54,588 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13050, loss[loss=0.08552, beats_loss=0.0115, ecapa_loss=0.0002111, whisper_loss=0.07191, over 15378.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01102, ecapa_loss=0.0001767, whisper_loss=0.09117, over 3870102.13 frames. ], batch size: 66, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:32:04,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1724610.0, ans=0.0 2024-08-12 16:32:06,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1724610.0, ans=0.0 2024-08-12 16:32:17,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.49 vs. limit=15.0 2024-08-12 16:32:31,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1724810.0, ans=0.5 2024-08-12 16:32:41,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1724810.0, ans=0.2 2024-08-12 16:32:49,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1724910.0, ans=0.2 2024-08-12 16:32:59,740 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-12 16:33:03,346 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 16:33:12,938 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 16:33:16,252 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 16:33:17,995 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13100, loss[loss=0.08141, beats_loss=0.0106, ecapa_loss=0.0001921, whisper_loss=0.0689, over 14032.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01106, ecapa_loss=0.0001766, whisper_loss=0.09115, over 3858986.47 frames. ], batch size: 55, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:33:41,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.633e+01 2.841e+01 3.164e+01 5.259e+01, threshold=5.682e+01, percent-clipped=0.0 2024-08-12 16:33:54,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1725310.0, ans=0.125 2024-08-12 16:34:06,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1725410.0, ans=0.0 2024-08-12 16:34:14,426 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.28 vs. limit=10.0 2024-08-12 16:34:20,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1725510.0, ans=0.0 2024-08-12 16:34:31,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1725510.0, ans=0.0 2024-08-12 16:34:35,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1725510.0, ans=0.0 2024-08-12 16:34:38,516 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13150, loss[loss=0.112, beats_loss=0.009559, ecapa_loss=0.0002191, whisper_loss=0.1003, over 15242.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01107, ecapa_loss=0.000177, whisper_loss=0.09122, over 3848598.73 frames. ], batch size: 59, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:34:53,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1725610.0, ans=0.125 2024-08-12 16:35:01,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1725710.0, ans=0.125 2024-08-12 16:35:02,420 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 16:35:12,652 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 16:35:17,502 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-12 16:35:18,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1725810.0, ans=0.125 2024-08-12 16:35:27,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1725810.0, ans=0.1 2024-08-12 16:35:30,064 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 16:35:36,586 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 16:35:38,362 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 16:35:58,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1726010.0, ans=0.125 2024-08-12 16:36:02,209 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13200, loss[loss=0.09895, beats_loss=0.0121, ecapa_loss=0.0002151, whisper_loss=0.08469, over 16230.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01101, ecapa_loss=0.0001755, whisper_loss=0.09148, over 3858349.89 frames. ], batch size: 68, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:36:09,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1726110.0, ans=0.1 2024-08-12 16:36:20,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.39 vs. limit=22.5 2024-08-12 16:36:25,773 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.556e+01 2.815e+01 3.284e+01 6.256e+01, threshold=5.630e+01, percent-clipped=1.0 2024-08-12 16:36:30,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1726210.0, ans=0.1 2024-08-12 16:36:32,617 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.640e-03 2024-08-12 16:36:57,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1726410.0, ans=0.0 2024-08-12 16:37:07,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1726510.0, ans=0.0 2024-08-12 16:37:21,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1726510.0, ans=0.125 2024-08-12 16:37:24,775 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13250, loss[loss=0.1048, beats_loss=0.009842, ecapa_loss=0.0002099, whisper_loss=0.09281, over 18980.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01103, ecapa_loss=0.0001767, whisper_loss=0.09099, over 3848635.60 frames. ], batch size: 79, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:37:39,267 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-12 16:37:58,495 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 16:38:14,530 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 16:38:42,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1727010.0, ans=0.1 2024-08-12 16:38:49,466 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13300, loss[loss=0.1042, beats_loss=0.008527, ecapa_loss=0.0002181, whisper_loss=0.09348, over 15157.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01103, ecapa_loss=0.0001768, whisper_loss=0.09074, over 3844373.05 frames. ], batch size: 62, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:38:54,357 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 16:38:57,149 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-12 16:39:06,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1727210.0, ans=0.025 2024-08-12 16:39:11,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1727210.0, ans=0.1 2024-08-12 16:39:12,812 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.550e+01 2.829e+01 3.095e+01 6.127e+01, threshold=5.657e+01, percent-clipped=1.0 2024-08-12 16:39:16,536 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 16:39:23,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1727310.0, ans=0.95 2024-08-12 16:39:41,284 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-12 16:39:44,578 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 18 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 16:39:52,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1727510.0, ans=15.0 2024-08-12 16:39:56,774 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 16:40:09,908 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13350, loss[loss=0.1074, beats_loss=0.01081, ecapa_loss=0.0001635, whisper_loss=0.09499, over 19560.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01105, ecapa_loss=0.0001768, whisper_loss=0.09086, over 3845137.20 frames. ], batch size: 77, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:40:15,338 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.701e+01 2024-08-12 16:40:18,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1727610.0, ans=0.0 2024-08-12 16:40:24,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1727610.0, ans=0.0 2024-08-12 16:40:24,454 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=15.0 2024-08-12 16:40:40,148 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 16:41:10,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1727910.0, ans=0.1 2024-08-12 16:41:31,281 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13400, loss[loss=0.09209, beats_loss=0.01182, ecapa_loss=0.0001452, whisper_loss=0.07882, over 17640.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01107, ecapa_loss=0.0001757, whisper_loss=0.09103, over 3844914.22 frames. ], batch size: 69, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:41:31,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1728110.0, ans=0.1 2024-08-12 16:41:53,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1728210.0, ans=0.125 2024-08-12 16:41:54,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.756e+01 3.172e+01 3.565e+01 5.325e+01, threshold=6.343e+01, percent-clipped=0.0 2024-08-12 16:42:00,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1728210.0, ans=0.125 2024-08-12 16:42:02,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1728310.0, ans=0.125 2024-08-12 16:42:02,605 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.56 vs. limit=22.5 2024-08-12 16:42:07,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1728310.0, ans=0.0 2024-08-12 16:42:20,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1728410.0, ans=0.125 2024-08-12 16:42:30,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1728410.0, ans=0.125 2024-08-12 16:42:50,929 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13450, loss[loss=0.1051, beats_loss=0.009076, ecapa_loss=0.0001622, whisper_loss=0.09443, over 15957.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01104, ecapa_loss=0.0001754, whisper_loss=0.09162, over 3852048.06 frames. ], batch size: 64, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:42:56,249 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 17 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 16:43:14,204 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 16:43:20,508 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 16:43:20,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1728710.0, ans=0.125 2024-08-12 16:43:34,596 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-08-12 16:43:34,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.44 vs. limit=15.0 2024-08-12 16:44:10,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1729010.0, ans=0.0 2024-08-12 16:44:15,014 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13500, loss[loss=0.09156, beats_loss=0.01028, ecapa_loss=0.0001725, whisper_loss=0.07956, over 17400.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01106, ecapa_loss=0.0001763, whisper_loss=0.09129, over 3854612.29 frames. ], batch size: 70, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:44:17,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1729110.0, ans=10.0 2024-08-12 16:44:17,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1729110.0, ans=0.0 2024-08-12 16:44:18,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1729110.0, ans=0.1 2024-08-12 16:44:22,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1729110.0, ans=0.0 2024-08-12 16:44:23,247 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 22 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 16:44:27,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1729110.0, ans=0.5 2024-08-12 16:44:33,649 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2024-08-12 16:44:36,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1729210.0, ans=0.125 2024-08-12 16:44:37,006 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2024-08-12 16:44:38,852 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.512e+01 2.797e+01 3.062e+01 5.746e+01, threshold=5.594e+01, percent-clipped=0.0 2024-08-12 16:44:43,408 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 16:45:03,156 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-12 16:45:08,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1729410.0, ans=0.125 2024-08-12 16:45:18,718 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 16:45:21,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1729510.0, ans=0.035 2024-08-12 16:45:27,832 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-12 16:45:34,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.49 vs. limit=10.0 2024-08-12 16:45:36,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1729510.0, ans=0.0 2024-08-12 16:45:38,609 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13550, loss[loss=0.1259, beats_loss=0.00927, ecapa_loss=0.0001799, whisper_loss=0.1149, over 22768.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01098, ecapa_loss=0.0001767, whisper_loss=0.09198, over 3870884.11 frames. ], batch size: 88, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:45:45,897 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-12 16:46:07,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1729710.0, ans=0.0 2024-08-12 16:46:11,712 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-12 16:46:19,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1729810.0, ans=0.125 2024-08-12 16:46:23,402 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2024-08-12 16:46:40,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1729910.0, ans=0.125 2024-08-12 16:46:56,611 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.26 vs. limit=15.0 2024-08-12 16:46:57,697 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-12 16:47:05,952 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13600, loss[loss=0.1028, beats_loss=0.01195, ecapa_loss=0.0001534, whisper_loss=0.08935, over 23359.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01105, ecapa_loss=0.0001745, whisper_loss=0.09184, over 3888525.01 frames. ], batch size: 94, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:47:30,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1730210.0, ans=0.0 2024-08-12 16:47:31,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.482e+01 2.733e+01 3.104e+01 2.478e+02, threshold=5.467e+01, percent-clipped=1.0 2024-08-12 16:47:59,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1730410.0, ans=0.0 2024-08-12 16:48:01,302 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 16:48:02,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=12.0 2024-08-12 16:48:02,997 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-12 16:48:04,574 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 16:48:09,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1730410.0, ans=0.125 2024-08-12 16:48:11,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1730410.0, ans=0.1 2024-08-12 16:48:31,127 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13650, loss[loss=0.1107, beats_loss=0.01174, ecapa_loss=0.0001329, whisper_loss=0.09762, over 18575.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01117, ecapa_loss=0.000173, whisper_loss=0.09124, over 3912218.83 frames. ], batch size: 73, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:48:41,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=1730610.0, ans=0.2 2024-08-12 16:48:44,428 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 25 from Vox, 13 fro AS 2024-08-12 16:49:04,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1730810.0, ans=0.125 2024-08-12 16:49:07,885 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 16:49:08,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1730810.0, ans=0.0 2024-08-12 16:49:18,292 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 29 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 16:49:18,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1730810.0, ans=0.1 2024-08-12 16:49:26,885 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2024-08-12 16:49:32,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1730910.0, ans=0.09899494936611666 2024-08-12 16:49:38,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1730910.0, ans=0.1 2024-08-12 16:49:46,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1731010.0, ans=0.125 2024-08-12 16:49:56,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1731010.0, ans=0.0 2024-08-12 16:50:04,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1731010.0, ans=0.1 2024-08-12 16:50:06,969 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13700, loss[loss=0.136, beats_loss=0.01007, ecapa_loss=0.0001851, whisper_loss=0.1241, over 22462.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01111, ecapa_loss=0.0001744, whisper_loss=0.0919, over 3909636.48 frames. ], batch size: 92, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:50:22,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-08-12 16:50:30,351 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.72 vs. limit=6.0 2024-08-12 16:50:34,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.487e+01 2.754e+01 3.214e+01 5.264e+01, threshold=5.508e+01, percent-clipped=0.0 2024-08-12 16:50:38,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1731210.0, ans=0.1 2024-08-12 16:50:44,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1731310.0, ans=0.125 2024-08-12 16:51:21,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1731510.0, ans=0.0 2024-08-12 16:51:33,598 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13750, loss[loss=0.1095, beats_loss=0.008419, ecapa_loss=0.0001979, whisper_loss=0.09906, over 13791.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01099, ecapa_loss=0.0001734, whisper_loss=0.09252, over 3892305.21 frames. ], batch size: 54, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:51:53,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1731710.0, ans=0.125 2024-08-12 16:51:58,709 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-08-12 16:52:11,823 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 16:52:27,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1731910.0, ans=0.0 2024-08-12 16:52:56,105 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 16:52:59,356 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13800, loss[loss=0.1147, beats_loss=0.01141, ecapa_loss=0.0001168, whisper_loss=0.1021, over 22354.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01101, ecapa_loss=0.0001721, whisper_loss=0.09265, over 3911453.00 frames. ], batch size: 82, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:53:04,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1732110.0, ans=0.0 2024-08-12 16:53:06,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1732110.0, ans=0.125 2024-08-12 16:53:23,890 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-12 16:53:24,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.542e+01 2.940e+01 3.312e+01 1.437e+02, threshold=5.879e+01, percent-clipped=2.0 2024-08-12 16:53:44,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1732310.0, ans=0.1 2024-08-12 16:53:46,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1732310.0, ans=0.1 2024-08-12 16:53:57,023 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 16:54:23,113 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 16:54:28,087 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13850, loss[loss=0.113, beats_loss=0.007189, ecapa_loss=0.0001964, whisper_loss=0.1038, over 13598.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01094, ecapa_loss=0.0001732, whisper_loss=0.09286, over 3901420.82 frames. ], batch size: 53, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:54:38,450 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.88 vs. limit=6.0 2024-08-12 16:54:55,699 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 16:55:58,146 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-12 16:55:59,216 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13900, loss[loss=0.08896, beats_loss=0.01157, ecapa_loss=0.0001847, whisper_loss=0.07555, over 22113.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01088, ecapa_loss=0.0001737, whisper_loss=0.09345, over 3897342.31 frames. ], batch size: 91, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:56:03,585 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 16:56:11,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1733110.0, ans=0.125 2024-08-12 16:56:25,957 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.635e+01 2.870e+01 3.246e+01 6.120e+01, threshold=5.740e+01, percent-clipped=1.0 2024-08-12 16:56:29,880 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 16:56:52,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1733410.0, ans=0.0 2024-08-12 16:56:54,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1733410.0, ans=0.2 2024-08-12 16:56:56,151 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2024-08-12 16:57:20,381 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 16:57:20,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1733610.0, ans=0.125 2024-08-12 16:57:21,627 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 13950, loss[loss=0.1032, beats_loss=0.0116, ecapa_loss=0.0001692, whisper_loss=0.08986, over 22232.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01092, ecapa_loss=0.0001743, whisper_loss=0.09296, over 3879743.41 frames. ], batch size: 89, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:57:39,458 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-12 16:57:51,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1733710.0, ans=0.1 2024-08-12 16:58:08,575 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.18 vs. limit=22.5 2024-08-12 16:58:13,652 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 16:58:17,272 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 16:58:28,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1734010.0, ans=0.125 2024-08-12 16:58:31,803 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 30 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 16:58:33,858 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-12 16:58:44,582 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.25 vs. limit=22.5 2024-08-12 16:58:44,940 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 14000, loss[loss=0.1178, beats_loss=0.009647, ecapa_loss=0.0001661, whisper_loss=0.1064, over 19216.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01085, ecapa_loss=0.0001728, whisper_loss=0.09346, over 3863815.67 frames. ], batch size: 75, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:58:57,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1734110.0, ans=0.125 2024-08-12 16:59:06,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1734210.0, ans=0.125 2024-08-12 16:59:09,572 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.481e+01 2.768e+01 3.199e+01 7.750e+01, threshold=5.536e+01, percent-clipped=1.0 2024-08-12 16:59:11,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1734210.0, ans=0.0 2024-08-12 16:59:30,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1734310.0, ans=0.125 2024-08-12 16:59:48,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1734410.0, ans=0.1 2024-08-12 16:59:54,538 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 17:00:04,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1734510.0, ans=0.09899494936611666 2024-08-12 17:00:10,999 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 17:00:14,590 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 14050, loss[loss=0.08568, beats_loss=0.01122, ecapa_loss=0.0001773, whisper_loss=0.07269, over 13786.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.011, ecapa_loss=0.000172, whisper_loss=0.09235, over 3839869.03 frames. ], batch size: 57, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:00:28,996 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 17:00:33,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1734710.0, ans=0.1 2024-08-12 17:00:43,354 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 17:00:53,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1734810.0, ans=0.125 2024-08-12 17:01:17,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1734910.0, ans=0.2 2024-08-12 17:01:27,080 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 17:01:41,460 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 14100, loss[loss=0.09, beats_loss=0.01087, ecapa_loss=0.0001873, whisper_loss=0.07726, over 20633.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01103, ecapa_loss=0.0001717, whisper_loss=0.09239, over 3853311.38 frames. ], batch size: 86, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:01:51,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1735110.0, ans=10.0 2024-08-12 17:02:04,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1735210.0, ans=0.0 2024-08-12 17:02:06,073 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 30 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 17:02:10,222 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.510e+01 2.862e+01 3.257e+01 4.688e+01, threshold=5.723e+01, percent-clipped=0.0 2024-08-12 17:02:23,379 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-12 17:02:33,410 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 17:03:08,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1735510.0, ans=0.04949747468305833 2024-08-12 17:03:10,897 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 14150, loss[loss=0.09614, beats_loss=0.01277, ecapa_loss=0.000144, whisper_loss=0.08193, over 21792.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01111, ecapa_loss=0.0001712, whisper_loss=0.09212, over 3850985.19 frames. ], batch size: 87, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:03:14,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1735610.0, ans=0.125 2024-08-12 17:03:19,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1735610.0, ans=0.0 2024-08-12 17:03:23,601 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 17:03:23,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1735610.0, ans=0.125 2024-08-12 17:03:23,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1735610.0, ans=0.09899494936611666 2024-08-12 17:04:07,181 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2024-08-12 17:04:23,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1735910.0, ans=0.1 2024-08-12 17:04:50,236 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 14200, loss[loss=0.1168, beats_loss=0.009582, ecapa_loss=0.0001451, whisper_loss=0.1058, over 23345.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01106, ecapa_loss=0.0001704, whisper_loss=0.09244, over 3898316.08 frames. ], batch size: 89, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:05:02,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1736110.0, ans=0.0 2024-08-12 17:05:09,285 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.40 vs. limit=15.0 2024-08-12 17:05:11,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1736210.0, ans=0.125 2024-08-12 17:05:14,314 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.568e+01 2.822e+01 3.210e+01 8.568e+01, threshold=5.645e+01, percent-clipped=1.0 2024-08-12 17:05:43,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=1736410.0, ans=0.02 2024-08-12 17:05:51,908 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 17:05:57,164 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=12.0 2024-08-12 17:06:01,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1736510.0, ans=0.2 2024-08-12 17:06:05,036 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 17:06:07,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1736510.0, ans=0.125 2024-08-12 17:06:10,910 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 14250, loss[loss=0.09136, beats_loss=0.01088, ecapa_loss=0.0001592, whisper_loss=0.07889, over 22142.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01108, ecapa_loss=0.0001698, whisper_loss=0.09205, over 3927871.40 frames. ], batch size: 88, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:06:16,447 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 17:06:20,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1736610.0, ans=0.125 2024-08-12 17:06:53,967 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 17:06:57,807 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 17:07:44,005 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 14300, loss[loss=0.1182, beats_loss=0.01113, ecapa_loss=0.0001857, whisper_loss=0.1052, over 22808.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01106, ecapa_loss=0.0001703, whisper_loss=0.09159, over 3914194.90 frames. ], batch size: 91, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:08:06,051 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 17:08:10,784 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.619e+01 2.822e+01 3.259e+01 8.695e+01, threshold=5.643e+01, percent-clipped=1.0 2024-08-12 17:08:16,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1737210.0, ans=0.125 2024-08-12 17:08:18,340 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 17:08:39,257 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 17:09:11,218 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 14350, loss[loss=0.09148, beats_loss=0.01321, ecapa_loss=0.0001324, whisper_loss=0.07694, over 21618.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.011, ecapa_loss=0.0001713, whisper_loss=0.09171, over 3901245.82 frames. ], batch size: 83, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:09:11,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1737610.0, ans=0.1 2024-08-12 17:09:12,803 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-12 17:09:14,750 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 17:09:38,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-12 17:09:43,016 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 28 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 17:09:43,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2024-08-12 17:09:48,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1737810.0, ans=0.0 2024-08-12 17:09:48,818 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-12 17:10:07,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1737910.0, ans=0.125 2024-08-12 17:10:43,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.29 vs. limit=12.0 2024-08-12 17:10:53,035 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 17:10:57,120 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 17:11:00,255 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 17:11:07,080 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 17:11:10,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1738010.0, ans=0.0 2024-08-12 17:11:13,065 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 14400, loss[loss=0.07427, beats_loss=0.01331, ecapa_loss=0.0002237, whisper_loss=0.05872, over 21499.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01103, ecapa_loss=0.0001741, whisper_loss=0.09166, over 3931066.35 frames. ], batch size: 94, lr: 5.14e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:11:16,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1738110.0, ans=0.0 2024-08-12 17:11:24,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1738110.0, ans=0.1 2024-08-12 17:11:26,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1738110.0, ans=0.125 2024-08-12 17:11:33,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1738210.0, ans=0.125 2024-08-12 17:11:36,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1738210.0, ans=0.1 2024-08-12 17:11:44,587 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.468e+01 2.751e+01 3.183e+01 4.709e+01, threshold=5.502e+01, percent-clipped=0.0 2024-08-12 17:12:17,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1738410.0, ans=0.5 2024-08-12 17:12:35,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1738510.0, ans=0.1 2024-08-12 17:12:52,973 INFO [train_multi_KD3.py:1116] (1/4) Epoch 12, batch 14450, loss[loss=0.1069, beats_loss=0.01102, ecapa_loss=0.0001465, whisper_loss=0.09443, over 22347.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01111, ecapa_loss=0.0001733, whisper_loss=0.09203, over 3931029.27 frames. ], batch size: 92, lr: 5.14e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:13:17,601 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 17:13:21,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1738710.0, ans=0.0 2024-08-12 17:13:28,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.32 vs. limit=22.5 2024-08-12 17:13:41,711 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 25 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-12 17:13:47,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1738910.0, ans=0.0 2024-08-12 17:13:49,743 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-12 17:15:01,338 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 0, loss[loss=0.1245, beats_loss=0.0101, ecapa_loss=0.0001442, whisper_loss=0.1129, over 19396.00 frames. ], tot_loss[loss=0.1245, beats_loss=0.0101, ecapa_loss=0.0001442, whisper_loss=0.1129, over 19396.00 frames. ], batch size: 71, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:15:01,339 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 17:15:45,063 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on ASR_libri: loss=0.255, beats_loss=0, ecapa_loss=0.0005844, whisper_loss=0.2492, over 922467.00 frames. 2024-08-12 17:16:01,445 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on SV_voxceleb1: loss=0.004777, beats_loss=0, ecapa_loss=0.0004777, whisper_loss=0, over 939242.00 frames. 2024-08-12 17:18:04,488 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on AT_audioset: loss=0.02416, beats_loss=0.02416, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 17:18:04,491 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 17:18:12,373 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 17:18:12,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1739080.0, ans=0.0 2024-08-12 17:18:53,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1739180.0, ans=0.125 2024-08-12 17:18:55,543 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.525e+01 2.835e+01 3.382e+01 8.605e+01, threshold=5.671e+01, percent-clipped=1.0 2024-08-12 17:19:01,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1739280.0, ans=0.125 2024-08-12 17:19:06,762 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 17:19:40,584 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2024-08-12 17:19:50,556 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-12 17:20:18,944 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 50, loss[loss=0.1087, beats_loss=0.01015, ecapa_loss=0.0001671, whisper_loss=0.09691, over 22486.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.009822, ecapa_loss=0.00018, whisper_loss=0.09531, over 911469.25 frames. ], batch size: 88, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:20:24,212 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2024-08-12 17:20:29,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1739580.0, ans=0.2 2024-08-12 17:20:36,154 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 17:20:40,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1739580.0, ans=0.125 2024-08-12 17:20:48,424 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 10 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 17:20:58,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1739680.0, ans=0.2 2024-08-12 17:21:00,852 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 17:21:54,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1739980.0, ans=0.95 2024-08-12 17:21:59,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1739980.0, ans=0.125 2024-08-12 17:22:04,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1739980.0, ans=0.0 2024-08-12 17:22:06,706 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 17:22:09,403 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 22 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-12 17:22:20,270 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 100, loss[loss=0.1039, beats_loss=0.01043, ecapa_loss=0.0001894, whisper_loss=0.09155, over 18490.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.00992, ecapa_loss=0.0001796, whisper_loss=0.0931, over 1548368.44 frames. ], batch size: 72, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:22:21,441 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-12 17:22:31,436 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2024-08-12 17:22:53,137 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2024-08-12 17:23:05,594 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.865e+01 3.060e+01 3.356e+01 6.213e+01, threshold=6.120e+01, percent-clipped=1.0 2024-08-12 17:23:15,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1740280.0, ans=0.125 2024-08-12 17:23:32,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1740380.0, ans=0.125 2024-08-12 17:23:34,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1740380.0, ans=0.1 2024-08-12 17:23:38,320 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 17:23:43,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1740380.0, ans=0.0 2024-08-12 17:23:48,170 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2024-08-12 17:23:53,804 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 17:23:56,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1740480.0, ans=0.0 2024-08-12 17:24:05,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1740480.0, ans=0.5 2024-08-12 17:24:07,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1740480.0, ans=0.125 2024-08-12 17:24:15,839 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 150, loss[loss=0.1181, beats_loss=0.007693, ecapa_loss=0.0001745, whisper_loss=0.1086, over 17224.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.009943, ecapa_loss=0.0001755, whisper_loss=0.09363, over 2043326.62 frames. ], batch size: 64, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:24:30,945 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 17:24:34,701 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 17:24:40,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1740680.0, ans=0.125 2024-08-12 17:24:55,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1740780.0, ans=0.0 2024-08-12 17:24:56,496 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 17:25:05,359 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.88 vs. limit=22.5 2024-08-12 17:25:11,610 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 17:25:40,565 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.31 vs. limit=22.5 2024-08-12 17:25:42,865 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 200, loss[loss=0.09817, beats_loss=0.01083, ecapa_loss=0.0001219, whisper_loss=0.08612, over 23321.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01018, ecapa_loss=0.0001746, whisper_loss=0.0932, over 2448561.06 frames. ], batch size: 88, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:25:55,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1741080.0, ans=0.5 2024-08-12 17:26:06,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1741180.0, ans=0.2 2024-08-12 17:26:11,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.594e+01 3.008e+01 3.381e+01 4.307e+01, threshold=6.015e+01, percent-clipped=0.0 2024-08-12 17:26:26,239 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2024-08-12 17:26:28,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2024-08-12 17:26:37,826 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 17:26:48,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2024-08-12 17:27:00,256 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 250, loss[loss=0.1143, beats_loss=0.009621, ecapa_loss=0.0001744, whisper_loss=0.1029, over 20921.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01038, ecapa_loss=0.0001734, whisper_loss=0.09189, over 2731707.71 frames. ], batch size: 81, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:27:11,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1741580.0, ans=0.1 2024-08-12 17:27:20,244 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2024-08-12 17:27:34,391 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 17:27:40,830 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=12.0 2024-08-12 17:27:41,653 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 17:27:46,197 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 17:27:47,456 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 31 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 17:27:48,910 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 17:28:05,752 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 17:28:17,816 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 300, loss[loss=0.1103, beats_loss=0.008932, ecapa_loss=0.0002155, whisper_loss=0.09916, over 22862.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01055, ecapa_loss=0.0001736, whisper_loss=0.09146, over 2977901.94 frames. ], batch size: 91, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:28:34,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1742180.0, ans=0.125 2024-08-12 17:28:40,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1742180.0, ans=0.0 2024-08-12 17:28:42,845 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 17:28:44,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.349e+01 2.732e+01 3.113e+01 6.634e+01, threshold=5.463e+01, percent-clipped=1.0 2024-08-12 17:28:46,549 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 30 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-12 17:28:49,200 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-12 17:28:50,761 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 17:29:03,605 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2024-08-12 17:29:13,915 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.40 vs. limit=10.0 2024-08-12 17:29:14,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1742380.0, ans=0.05 2024-08-12 17:29:23,219 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=22.5 2024-08-12 17:29:32,755 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 350, loss[loss=0.1301, beats_loss=0.006962, ecapa_loss=0.0001893, whisper_loss=0.1212, over 16086.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01053, ecapa_loss=0.0001733, whisper_loss=0.09198, over 3188364.81 frames. ], batch size: 61, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:29:47,147 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 18 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-12 17:30:04,659 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 17:30:29,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1742980.0, ans=0.125 2024-08-12 17:30:30,082 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 28 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-12 17:30:44,507 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 400, loss[loss=0.1132, beats_loss=0.01225, ecapa_loss=0.0001127, whisper_loss=0.09982, over 16932.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01061, ecapa_loss=0.0001723, whisper_loss=0.09132, over 3296993.44 frames. ], batch size: 62, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:30:48,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1743080.0, ans=0.125 2024-08-12 17:31:00,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1743180.0, ans=0.0 2024-08-12 17:31:10,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.525e+01 2.765e+01 3.244e+01 1.385e+02, threshold=5.529e+01, percent-clipped=2.0 2024-08-12 17:31:18,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2024-08-12 17:31:30,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1743380.0, ans=0.09899494936611666 2024-08-12 17:31:30,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1743380.0, ans=0.125 2024-08-12 17:31:39,094 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 17:31:40,253 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 17:31:44,850 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-08-12 17:31:51,582 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 17:31:56,715 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.11 vs. limit=5.0 2024-08-12 17:31:58,329 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 450, loss[loss=0.1062, beats_loss=0.00969, ecapa_loss=0.0002361, whisper_loss=0.0941, over 20215.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001732, whisper_loss=0.09072, over 3419241.73 frames. ], batch size: 88, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:32:00,527 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 17:32:01,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1743580.0, ans=0.0 2024-08-12 17:32:02,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1743580.0, ans=0.125 2024-08-12 17:32:15,934 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 17:32:22,098 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 17:32:24,844 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-12 17:32:32,140 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 17:32:48,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1743880.0, ans=0.125 2024-08-12 17:33:10,499 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-12 17:33:11,482 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 500, loss[loss=0.09906, beats_loss=0.01173, ecapa_loss=0.0001806, whisper_loss=0.08552, over 22913.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001733, whisper_loss=0.09054, over 3551217.68 frames. ], batch size: 92, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:33:21,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1744080.0, ans=0.07 2024-08-12 17:33:28,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1744180.0, ans=0.07 2024-08-12 17:33:39,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1744180.0, ans=0.125 2024-08-12 17:33:40,523 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.531e+01 2.780e+01 3.170e+01 4.119e+01, threshold=5.561e+01, percent-clipped=0.0 2024-08-12 17:33:54,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1744280.0, ans=0.0 2024-08-12 17:34:11,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1744380.0, ans=0.1 2024-08-12 17:34:12,565 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-12 17:34:12,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1744380.0, ans=0.125 2024-08-12 17:34:25,200 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 17:34:30,859 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 550, loss[loss=0.1049, beats_loss=0.01244, ecapa_loss=0.0001523, whisper_loss=0.09099, over 19776.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0107, ecapa_loss=0.0001716, whisper_loss=0.09097, over 3599579.57 frames. ], batch size: 79, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:34:36,145 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 17:34:39,020 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-12 17:34:40,880 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.31 vs. limit=22.5 2024-08-12 17:35:35,050 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-12 17:35:44,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1745080.0, ans=0.125 2024-08-12 17:35:45,746 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 600, loss[loss=0.1049, beats_loss=0.01247, ecapa_loss=0.0001477, whisper_loss=0.09096, over 17746.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001695, whisper_loss=0.09063, over 3672486.92 frames. ], batch size: 70, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:36:08,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1745180.0, ans=0.0 2024-08-12 17:36:11,576 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.517e+01 2.834e+01 3.150e+01 6.498e+01, threshold=5.667e+01, percent-clipped=2.0 2024-08-12 17:36:23,991 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-12 17:36:45,232 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 17:36:46,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1745480.0, ans=0.1 2024-08-12 17:36:49,677 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 17:36:57,707 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 650, loss[loss=0.09726, beats_loss=0.00823, ecapa_loss=0.000185, whisper_loss=0.08718, over 16851.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01074, ecapa_loss=0.0001703, whisper_loss=0.09024, over 3705260.48 frames. ], batch size: 63, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:37:06,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1745580.0, ans=0.125 2024-08-12 17:37:13,212 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 17:37:16,339 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 27 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 17:37:28,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1745780.0, ans=0.125 2024-08-12 17:37:39,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1745780.0, ans=0.05 2024-08-12 17:37:39,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1745780.0, ans=0.125 2024-08-12 17:37:47,713 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.30 vs. limit=10.0 2024-08-12 17:38:08,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1745980.0, ans=0.0 2024-08-12 17:38:10,844 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 700, loss[loss=0.1387, beats_loss=0.006765, ecapa_loss=0.0002226, whisper_loss=0.1297, over 14169.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001704, whisper_loss=0.09072, over 3699583.48 frames. ], batch size: 55, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:38:16,661 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2024-08-12 17:38:30,466 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2024-08-12 17:38:34,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1746180.0, ans=0.125 2024-08-12 17:38:37,146 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.60 vs. limit=15.0 2024-08-12 17:38:37,174 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=15.0 2024-08-12 17:38:37,566 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.446e+01 2.651e+01 3.040e+01 5.006e+01, threshold=5.302e+01, percent-clipped=0.0 2024-08-12 17:38:47,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1746280.0, ans=0.1 2024-08-12 17:38:47,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1746280.0, ans=0.0 2024-08-12 17:38:48,562 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 17:38:49,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1746280.0, ans=0.125 2024-08-12 17:38:51,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1746280.0, ans=0.015 2024-08-12 17:38:55,642 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 17:39:14,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1746480.0, ans=0.125 2024-08-12 17:39:14,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1746480.0, ans=10.0 2024-08-12 17:39:24,616 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 750, loss[loss=0.09626, beats_loss=0.007869, ecapa_loss=0.0001791, whisper_loss=0.0866, over 16788.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001701, whisper_loss=0.09079, over 3689320.87 frames. ], batch size: 64, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:39:32,255 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.03 vs. limit=15.0 2024-08-12 17:39:37,784 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=15.0 2024-08-12 17:39:39,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.23 vs. limit=10.0 2024-08-12 17:39:50,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1746680.0, ans=0.05 2024-08-12 17:39:52,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1746780.0, ans=0.1 2024-08-12 17:40:01,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1746780.0, ans=0.1 2024-08-12 17:40:02,890 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 17:40:14,434 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 17:40:14,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1746880.0, ans=0.1 2024-08-12 17:40:34,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1746980.0, ans=0.125 2024-08-12 17:40:37,199 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 800, loss[loss=0.1088, beats_loss=0.01143, ecapa_loss=0.0001579, whisper_loss=0.09575, over 17256.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.0001704, whisper_loss=0.09032, over 3703176.52 frames. ], batch size: 69, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:40:41,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1747080.0, ans=0.2 2024-08-12 17:40:46,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1747080.0, ans=0.0 2024-08-12 17:40:56,913 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 17:41:03,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.398e+01 2.726e+01 3.050e+01 4.286e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-12 17:41:17,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1747280.0, ans=0.125 2024-08-12 17:41:27,446 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 17:41:28,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1747380.0, ans=0.125 2024-08-12 17:41:42,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1747480.0, ans=0.125 2024-08-12 17:41:47,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1747480.0, ans=0.0 2024-08-12 17:41:51,308 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 850, loss[loss=0.106, beats_loss=0.01077, ecapa_loss=0.0001999, whisper_loss=0.09322, over 21281.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01069, ecapa_loss=0.0001708, whisper_loss=0.09002, over 3734641.56 frames. ], batch size: 89, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:41:54,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1747580.0, ans=0.1 2024-08-12 17:42:23,552 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 17:42:25,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1747780.0, ans=0.125 2024-08-12 17:42:33,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1747780.0, ans=0.125 2024-08-12 17:42:40,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1747880.0, ans=0.0 2024-08-12 17:42:47,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1747880.0, ans=0.125 2024-08-12 17:42:50,828 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2024-08-12 17:42:56,177 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 28 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 17:43:06,052 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 900, loss[loss=0.07803, beats_loss=0.01336, ecapa_loss=0.000129, whisper_loss=0.06337, over 14908.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.0001695, whisper_loss=0.09044, over 3780982.40 frames. ], batch size: 57, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:43:32,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.406e+01 2.653e+01 2.914e+01 6.572e+01, threshold=5.306e+01, percent-clipped=1.0 2024-08-12 17:43:41,069 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 22 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 17:43:51,198 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 17:43:52,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1748380.0, ans=0.0 2024-08-12 17:44:02,545 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-12 17:44:05,427 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 19 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 17:44:12,492 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-12 17:44:16,501 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 17:44:17,607 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 950, loss[loss=0.09005, beats_loss=0.01078, ecapa_loss=0.0001711, whisper_loss=0.07755, over 18189.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01077, ecapa_loss=0.0001691, whisper_loss=0.09006, over 3779909.39 frames. ], batch size: 71, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:44:30,971 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-12 17:44:32,646 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.77 vs. limit=10.0 2024-08-12 17:44:36,669 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 17:44:37,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1748680.0, ans=0.1 2024-08-12 17:44:39,793 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2024-08-12 17:45:03,436 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.18 vs. limit=6.0 2024-08-12 17:45:05,974 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2024-08-12 17:45:13,650 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 17:45:27,790 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1000, loss[loss=0.1042, beats_loss=0.01019, ecapa_loss=0.0001948, whisper_loss=0.09206, over 19037.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01091, ecapa_loss=0.0001674, whisper_loss=0.08924, over 3781478.39 frames. ], batch size: 77, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:45:34,851 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-12 17:45:53,644 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.479e+01 2.731e+01 3.171e+01 4.511e+01, threshold=5.462e+01, percent-clipped=0.0 2024-08-12 17:46:09,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1749280.0, ans=0.125 2024-08-12 17:46:12,284 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 17:46:15,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1749380.0, ans=22.5 2024-08-12 17:46:18,326 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 40 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-12 17:46:36,324 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 17:46:38,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1749480.0, ans=0.0 2024-08-12 17:46:41,750 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1050, loss[loss=0.1006, beats_loss=0.01173, ecapa_loss=0.0001642, whisper_loss=0.08718, over 17154.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01091, ecapa_loss=0.0001659, whisper_loss=0.08999, over 3810396.17 frames. ], batch size: 69, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:46:42,671 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=22.5 2024-08-12 17:46:45,535 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 17:46:48,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1749580.0, ans=0.125 2024-08-12 17:46:59,086 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 18 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-12 17:47:56,802 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 17 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 17:47:57,857 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1100, loss[loss=0.09435, beats_loss=0.01045, ecapa_loss=0.0001795, whisper_loss=0.08211, over 16434.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01098, ecapa_loss=0.0001657, whisper_loss=0.08979, over 3813431.96 frames. ], batch size: 65, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:48:00,808 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 17:48:07,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1750080.0, ans=0.1 2024-08-12 17:48:16,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1750180.0, ans=0.1 2024-08-12 17:48:24,785 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.582e+01 2.825e+01 3.154e+01 4.424e+01, threshold=5.651e+01, percent-clipped=0.0 2024-08-12 17:48:38,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1750280.0, ans=0.0 2024-08-12 17:48:59,382 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 17:49:05,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2024-08-12 17:49:07,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1750480.0, ans=0.125 2024-08-12 17:49:09,432 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-12 17:49:20,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1750480.0, ans=0.0 2024-08-12 17:49:23,393 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1150, loss[loss=0.1217, beats_loss=0.01075, ecapa_loss=0.0001623, whisper_loss=0.1093, over 22800.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01091, ecapa_loss=0.0001657, whisper_loss=0.09043, over 3823803.65 frames. ], batch size: 88, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:49:25,017 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-12 17:49:27,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1750580.0, ans=0.125 2024-08-12 17:49:43,716 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2024-08-12 17:49:47,685 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2024-08-12 17:50:28,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1750880.0, ans=0.1 2024-08-12 17:50:28,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1750880.0, ans=0.1 2024-08-12 17:50:38,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1750980.0, ans=0.0 2024-08-12 17:50:40,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2024-08-12 17:50:41,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1750980.0, ans=0.0 2024-08-12 17:50:51,304 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1200, loss[loss=0.09218, beats_loss=0.01063, ecapa_loss=0.0001711, whisper_loss=0.07984, over 14672.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0109, ecapa_loss=0.0001665, whisper_loss=0.09022, over 3811166.01 frames. ], batch size: 60, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:50:58,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1751080.0, ans=0.125 2024-08-12 17:51:01,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1751080.0, ans=0.2 2024-08-12 17:51:20,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1751180.0, ans=0.125 2024-08-12 17:51:27,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1751180.0, ans=0.2 2024-08-12 17:51:27,980 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.378e+01 2.599e+01 3.054e+01 4.994e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-12 17:51:29,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1751180.0, ans=0.0 2024-08-12 17:51:35,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1751280.0, ans=0.125 2024-08-12 17:51:41,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1751280.0, ans=0.0 2024-08-12 17:51:48,631 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 37 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 17:52:28,357 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 17:52:37,524 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1250, loss[loss=0.1186, beats_loss=0.007237, ecapa_loss=0.0001888, whisper_loss=0.1095, over 16718.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0109, ecapa_loss=0.0001674, whisper_loss=0.08999, over 3818811.38 frames. ], batch size: 63, lr: 4.93e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:52:38,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1751580.0, ans=0.1 2024-08-12 17:52:57,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1751580.0, ans=0.0 2024-08-12 17:52:59,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1751680.0, ans=0.125 2024-08-12 17:53:21,171 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 27 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-12 17:53:22,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1751780.0, ans=0.0 2024-08-12 17:54:27,179 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.47 vs. limit=15.0 2024-08-12 17:54:27,521 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1300, loss[loss=0.09332, beats_loss=0.009164, ecapa_loss=0.000171, whisper_loss=0.08244, over 13836.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01097, ecapa_loss=0.0001662, whisper_loss=0.09005, over 3817474.63 frames. ], batch size: 54, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:54:41,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1752080.0, ans=0.1 2024-08-12 17:55:06,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.406e+01 2.650e+01 2.964e+01 4.612e+01, threshold=5.300e+01, percent-clipped=0.0 2024-08-12 17:55:24,017 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.44 vs. limit=6.0 2024-08-12 17:55:33,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1752380.0, ans=0.2 2024-08-12 17:55:36,679 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 17:56:01,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1752480.0, ans=0.0 2024-08-12 17:56:13,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1752580.0, ans=0.2 2024-08-12 17:56:14,967 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1350, loss[loss=0.1269, beats_loss=0.01024, ecapa_loss=0.0001443, whisper_loss=0.1152, over 22758.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01106, ecapa_loss=0.000165, whisper_loss=0.08986, over 3821037.17 frames. ], batch size: 90, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:56:43,633 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.65 vs. limit=10.0 2024-08-12 17:56:49,839 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-12 17:57:10,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1752880.0, ans=0.1 2024-08-12 17:57:13,620 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-12 17:57:35,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1752980.0, ans=0.0 2024-08-12 17:57:38,101 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1400, loss[loss=0.09843, beats_loss=0.01305, ecapa_loss=0.0001248, whisper_loss=0.08413, over 21175.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01095, ecapa_loss=0.0001663, whisper_loss=0.09017, over 3811297.78 frames. ], batch size: 83, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:58:00,862 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 17:58:04,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.419e+01 2.702e+01 3.143e+01 2.017e+02, threshold=5.404e+01, percent-clipped=3.0 2024-08-12 17:58:36,499 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2024-08-12 17:59:02,986 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1450, loss[loss=0.09916, beats_loss=0.01112, ecapa_loss=0.0001304, whisper_loss=0.08674, over 17198.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01095, ecapa_loss=0.0001665, whisper_loss=0.09002, over 3785389.41 frames. ], batch size: 62, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:59:03,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1753580.0, ans=0.0 2024-08-12 17:59:18,171 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.42 vs. limit=12.0 2024-08-12 17:59:28,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1753680.0, ans=0.0 2024-08-12 17:59:40,721 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2024-08-12 17:59:47,195 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 17:59:47,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1753780.0, ans=0.125 2024-08-12 17:59:47,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1753780.0, ans=0.125 2024-08-12 18:00:10,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1753980.0, ans=0.1 2024-08-12 18:00:13,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1753980.0, ans=0.125 2024-08-12 18:00:21,798 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1500, loss[loss=0.094, beats_loss=0.01293, ecapa_loss=0.0001367, whisper_loss=0.0797, over 17696.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01092, ecapa_loss=0.000166, whisper_loss=0.09039, over 3789209.96 frames. ], batch size: 69, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:00:23,532 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 18:00:44,671 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 18:00:48,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1754180.0, ans=0.125 2024-08-12 18:00:48,747 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2024-08-12 18:00:50,966 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.459e+01 2.780e+01 3.185e+01 5.902e+01, threshold=5.561e+01, percent-clipped=1.0 2024-08-12 18:00:56,765 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2024-08-12 18:01:05,484 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 18:01:06,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=1754280.0, ans=0.2 2024-08-12 18:01:12,714 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.40 vs. limit=10.0 2024-08-12 18:01:18,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1754380.0, ans=0.1 2024-08-12 18:01:19,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2024-08-12 18:01:35,404 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 18:01:38,260 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 18:01:41,430 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1550, loss[loss=0.09468, beats_loss=0.009143, ecapa_loss=0.0002101, whisper_loss=0.08344, over 16002.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01091, ecapa_loss=0.0001665, whisper_loss=0.09076, over 3801405.12 frames. ], batch size: 66, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:01:48,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1754580.0, ans=0.1 2024-08-12 18:02:05,487 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 18:02:13,477 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 18 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 18:02:23,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1754780.0, ans=0.125 2024-08-12 18:02:27,200 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 18:02:31,801 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 15 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 18:02:50,161 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:02:57,065 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 42 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 18:02:57,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1755080.0, ans=0.125 2024-08-12 18:02:58,256 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1600, loss[loss=0.1392, beats_loss=0.007347, ecapa_loss=0.0001882, whisper_loss=0.13, over 24227.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.0001665, whisper_loss=0.09132, over 3825771.44 frames. ], batch size: 92, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:03:07,541 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 18:03:25,474 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.499e+01 2.878e+01 3.295e+01 8.050e+01, threshold=5.757e+01, percent-clipped=1.0 2024-08-12 18:03:26,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1755180.0, ans=0.1 2024-08-12 18:03:27,005 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-12 18:03:28,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1755280.0, ans=0.125 2024-08-12 18:03:42,223 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.24 vs. limit=22.5 2024-08-12 18:03:44,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1755380.0, ans=0.0 2024-08-12 18:03:55,158 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 27 from Vox, 18 fro AS 2024-08-12 18:04:06,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1755480.0, ans=0.125 2024-08-12 18:04:08,517 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=15.0 2024-08-12 18:04:10,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1755480.0, ans=0.125 2024-08-12 18:04:10,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1755480.0, ans=0.2 2024-08-12 18:04:14,012 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1650, loss[loss=0.08248, beats_loss=0.01288, ecapa_loss=0.0001309, whisper_loss=0.06829, over 22744.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01078, ecapa_loss=0.0001658, whisper_loss=0.09182, over 3795569.53 frames. ], batch size: 90, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:04:18,428 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 18:04:48,465 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 18:04:50,924 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.21 vs. limit=22.5 2024-08-12 18:05:08,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1755880.0, ans=0.0 2024-08-12 18:05:13,149 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 18:05:17,727 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 18:05:25,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1755980.0, ans=0.95 2024-08-12 18:05:25,213 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.567e+05 2024-08-12 18:05:29,035 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1700, loss[loss=0.09092, beats_loss=0.00821, ecapa_loss=0.0001518, whisper_loss=0.08119, over 16929.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0108, ecapa_loss=0.0001653, whisper_loss=0.09105, over 3803754.01 frames. ], batch size: 62, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:05:34,485 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.63 vs. limit=22.5 2024-08-12 18:05:35,530 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 30 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-12 18:05:49,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1756180.0, ans=0.125 2024-08-12 18:05:56,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.398e+01 2.715e+01 2.937e+01 4.103e+01, threshold=5.430e+01, percent-clipped=0.0 2024-08-12 18:06:07,091 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.65 vs. limit=22.5 2024-08-12 18:06:14,761 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 18:06:42,271 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1750, loss[loss=0.08131, beats_loss=0.01048, ecapa_loss=0.000163, whisper_loss=0.0692, over 15944.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01087, ecapa_loss=0.000165, whisper_loss=0.09037, over 3824749.26 frames. ], batch size: 60, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:07:24,432 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 18:07:38,049 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2024-08-12 18:07:55,187 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1800, loss[loss=0.1209, beats_loss=0.01088, ecapa_loss=0.0001635, whisper_loss=0.1084, over 16741.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01086, ecapa_loss=0.0001655, whisper_loss=0.09007, over 3794526.12 frames. ], batch size: 66, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:08:21,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.466e+01 2.734e+01 3.019e+01 6.645e+01, threshold=5.468e+01, percent-clipped=2.0 2024-08-12 18:08:28,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1757280.0, ans=0.0 2024-08-12 18:08:42,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1757380.0, ans=0.125 2024-08-12 18:08:45,186 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 18:08:48,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1757380.0, ans=0.0 2024-08-12 18:08:49,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1757380.0, ans=0.125 2024-08-12 18:08:58,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1757480.0, ans=0.0 2024-08-12 18:09:08,925 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1850, loss[loss=0.1083, beats_loss=0.00963, ecapa_loss=0.0001951, whisper_loss=0.09676, over 22287.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01082, ecapa_loss=0.0001661, whisper_loss=0.08993, over 3796959.34 frames. ], batch size: 93, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:09:17,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1757580.0, ans=0.1 2024-08-12 18:09:33,670 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 18:09:50,617 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 18:09:52,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1757880.0, ans=0.2 2024-08-12 18:09:53,376 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 10 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 18:09:54,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1757880.0, ans=15.0 2024-08-12 18:10:20,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1900, loss[loss=0.1024, beats_loss=0.01099, ecapa_loss=0.0001739, whisper_loss=0.08966, over 21739.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01079, ecapa_loss=0.0001671, whisper_loss=0.09021, over 3780190.07 frames. ], batch size: 90, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:10:22,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1758080.0, ans=0.125 2024-08-12 18:10:27,566 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-12 18:10:42,185 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.78 vs. limit=15.0 2024-08-12 18:10:47,046 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.395e+01 2.725e+01 3.038e+01 6.504e+01, threshold=5.449e+01, percent-clipped=3.0 2024-08-12 18:10:52,525 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-08-12 18:10:54,388 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-12 18:11:05,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1758380.0, ans=0.125 2024-08-12 18:11:17,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.78 vs. limit=15.0 2024-08-12 18:11:20,739 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-12 18:11:21,532 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-08-12 18:11:22,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1758480.0, ans=0.0 2024-08-12 18:11:25,372 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 36 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 18:11:34,164 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 1950, loss[loss=0.1186, beats_loss=0.007813, ecapa_loss=0.0001592, whisper_loss=0.1092, over 15474.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01084, ecapa_loss=0.0001686, whisper_loss=0.09013, over 3783897.31 frames. ], batch size: 57, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:11:46,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1758580.0, ans=0.1 2024-08-12 18:12:02,518 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 18:12:13,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1758780.0, ans=0.0 2024-08-12 18:12:36,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1758980.0, ans=0.2 2024-08-12 18:12:48,049 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2000, loss[loss=0.09441, beats_loss=0.01052, ecapa_loss=0.0002049, whisper_loss=0.08184, over 15127.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01092, ecapa_loss=0.0001675, whisper_loss=0.08984, over 3774722.89 frames. ], batch size: 64, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:12:57,559 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 18:13:06,795 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 18:13:15,376 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.512e+01 2.812e+01 3.299e+01 5.299e+01, threshold=5.623e+01, percent-clipped=0.0 2024-08-12 18:13:30,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1759280.0, ans=0.125 2024-08-12 18:13:33,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1759380.0, ans=0.1 2024-08-12 18:13:36,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1759380.0, ans=0.125 2024-08-12 18:14:01,999 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2050, loss[loss=0.103, beats_loss=0.01051, ecapa_loss=0.0001528, whisper_loss=0.09097, over 15946.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01095, ecapa_loss=0.0001682, whisper_loss=0.09019, over 3817095.67 frames. ], batch size: 59, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:14:06,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1759580.0, ans=0.125 2024-08-12 18:14:30,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1759780.0, ans=0.1 2024-08-12 18:14:34,574 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-12 18:14:40,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1759780.0, ans=0.125 2024-08-12 18:14:50,245 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.47 vs. limit=15.0 2024-08-12 18:14:50,314 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.32 vs. limit=10.0 2024-08-12 18:14:54,332 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-12 18:14:59,660 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 18:15:16,463 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2024-08-12 18:15:18,226 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2100, loss[loss=0.1093, beats_loss=0.01128, ecapa_loss=0.000162, whisper_loss=0.09645, over 23389.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01101, ecapa_loss=0.0001656, whisper_loss=0.09003, over 3817243.45 frames. ], batch size: 91, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:15:23,494 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 18:15:36,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1760180.0, ans=0.0 2024-08-12 18:15:43,251 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.436e+01 2.700e+01 3.111e+01 5.079e+01, threshold=5.401e+01, percent-clipped=0.0 2024-08-12 18:15:45,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1760280.0, ans=0.1 2024-08-12 18:15:47,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1760280.0, ans=0.125 2024-08-12 18:15:49,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1760280.0, ans=0.125 2024-08-12 18:15:56,461 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2024-08-12 18:16:06,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1760380.0, ans=0.125 2024-08-12 18:16:30,195 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2150, loss[loss=0.1021, beats_loss=0.01205, ecapa_loss=0.000194, whisper_loss=0.08807, over 17913.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0111, ecapa_loss=0.0001672, whisper_loss=0.09013, over 3845232.39 frames. ], batch size: 76, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:16:31,674 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 18:16:45,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1760680.0, ans=0.125 2024-08-12 18:16:46,814 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2024-08-12 18:16:54,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1760680.0, ans=0.0 2024-08-12 18:17:37,448 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2200, loss[loss=0.1123, beats_loss=0.009947, ecapa_loss=0.000163, whisper_loss=0.1007, over 14970.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01107, ecapa_loss=0.0001674, whisper_loss=0.09026, over 3812445.76 frames. ], batch size: 59, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:17:37,839 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:17:48,912 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-12 18:18:00,853 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.455e+01 2.695e+01 3.002e+01 4.139e+01, threshold=5.389e+01, percent-clipped=0.0 2024-08-12 18:18:07,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1761280.0, ans=0.2 2024-08-12 18:18:08,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1761280.0, ans=0.1 2024-08-12 18:18:11,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1761280.0, ans=0.125 2024-08-12 18:18:13,539 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 18:18:15,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1761380.0, ans=0.0 2024-08-12 18:18:24,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1761380.0, ans=0.1 2024-08-12 18:18:26,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1761380.0, ans=0.2 2024-08-12 18:18:41,608 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 18:18:42,699 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2250, loss[loss=0.0986, beats_loss=0.01073, ecapa_loss=0.0001724, whisper_loss=0.08615, over 21917.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01099, ecapa_loss=0.0001694, whisper_loss=0.09115, over 3837433.34 frames. ], batch size: 87, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:18:57,375 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.01 vs. limit=22.5 2024-08-12 18:18:59,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1761680.0, ans=0.0 2024-08-12 18:19:04,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1761680.0, ans=0.2 2024-08-12 18:19:08,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1761780.0, ans=0.025 2024-08-12 18:19:21,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1761880.0, ans=0.1 2024-08-12 18:19:30,196 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-12 18:19:31,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1761880.0, ans=0.5 2024-08-12 18:19:44,509 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.46 vs. limit=15.0 2024-08-12 18:19:45,003 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 15 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 18:19:47,231 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2300, loss[loss=0.1223, beats_loss=0.01062, ecapa_loss=0.0001325, whisper_loss=0.1104, over 23669.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01101, ecapa_loss=0.0001698, whisper_loss=0.09098, over 3863851.18 frames. ], batch size: 90, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:19:51,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1762080.0, ans=0.0 2024-08-12 18:19:56,354 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 18:20:10,262 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.85 vs. limit=15.0 2024-08-12 18:20:10,977 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.458e+01 2.734e+01 3.155e+01 5.696e+01, threshold=5.468e+01, percent-clipped=1.0 2024-08-12 18:20:26,617 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-08-12 18:20:42,180 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2024-08-12 18:20:42,791 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-12 18:20:44,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1762480.0, ans=0.0 2024-08-12 18:20:52,984 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2350, loss[loss=0.1089, beats_loss=0.01111, ecapa_loss=0.0001771, whisper_loss=0.09605, over 17677.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01099, ecapa_loss=0.0001697, whisper_loss=0.09119, over 3829316.57 frames. ], batch size: 72, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:21:07,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1762680.0, ans=0.125 2024-08-12 18:21:13,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1762680.0, ans=0.125 2024-08-12 18:21:27,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1762780.0, ans=0.0 2024-08-12 18:21:29,944 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-12 18:21:37,847 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-12 18:21:51,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1762980.0, ans=0.1 2024-08-12 18:21:58,469 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2400, loss[loss=0.09215, beats_loss=0.01034, ecapa_loss=0.0001909, whisper_loss=0.0799, over 19151.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01098, ecapa_loss=0.0001697, whisper_loss=0.09146, over 3849339.30 frames. ], batch size: 79, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:22:01,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1763080.0, ans=0.2 2024-08-12 18:22:03,900 WARNING [optim.py:496] (1/4) Scaling gradients by 0.05874495208263397, model_norm_threshold=54.68092727661133 2024-08-12 18:22:04,062 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.98, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.484e+05, grad_sumsq=9.566e+04, orig_rms_sq=8.869e+00 2024-08-12 18:22:06,943 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 33 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 18:22:08,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1763080.0, ans=0.125 2024-08-12 18:22:22,667 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.513e+01 2.845e+01 3.166e+01 9.308e+02, threshold=5.690e+01, percent-clipped=1.0 2024-08-12 18:22:47,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1763380.0, ans=0.125 2024-08-12 18:22:55,527 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 18:23:02,004 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 18:23:04,455 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2450, loss[loss=0.09489, beats_loss=0.008499, ecapa_loss=0.0001981, whisper_loss=0.08441, over 21200.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01091, ecapa_loss=0.0001701, whisper_loss=0.09183, over 3848940.02 frames. ], batch size: 88, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:23:11,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1763580.0, ans=0.1 2024-08-12 18:23:23,970 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.11 vs. limit=22.5 2024-08-12 18:23:32,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1763780.0, ans=0.0 2024-08-12 18:23:40,367 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 18:23:47,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-12 18:23:50,408 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 18:24:00,248 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-12 18:24:03,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1763980.0, ans=0.125 2024-08-12 18:24:06,211 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-12 18:24:09,631 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2500, loss[loss=0.1074, beats_loss=0.01065, ecapa_loss=0.0001328, whisper_loss=0.09539, over 17261.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01092, ecapa_loss=0.0001694, whisper_loss=0.09236, over 3865188.25 frames. ], batch size: 67, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:24:09,841 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 18:24:17,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1764080.0, ans=0.125 2024-08-12 18:24:23,008 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-12 18:24:23,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1764180.0, ans=0.125 2024-08-12 18:24:27,897 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 29 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 18:24:32,893 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.522e+01 2.839e+01 3.431e+01 9.983e+01, threshold=5.678e+01, percent-clipped=1.0 2024-08-12 18:24:36,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1764280.0, ans=0.125 2024-08-12 18:24:57,909 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 27 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 18:24:59,571 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 18:25:09,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1764480.0, ans=0.1 2024-08-12 18:25:15,306 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2550, loss[loss=0.07945, beats_loss=0.01281, ecapa_loss=0.000143, whisper_loss=0.06521, over 16434.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01086, ecapa_loss=0.0001683, whisper_loss=0.09281, over 3862040.35 frames. ], batch size: 66, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:25:17,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1764580.0, ans=0.125 2024-08-12 18:25:22,263 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 18:25:31,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1764680.0, ans=0.2 2024-08-12 18:25:31,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1764680.0, ans=0.2 2024-08-12 18:25:35,247 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 11 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 18:25:37,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1764680.0, ans=0.2 2024-08-12 18:26:05,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1764880.0, ans=0.125 2024-08-12 18:26:11,614 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 18:26:13,363 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-08-12 18:26:20,706 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2600, loss[loss=0.1224, beats_loss=0.007147, ecapa_loss=0.0002306, whisper_loss=0.113, over 18589.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01081, ecapa_loss=0.0001689, whisper_loss=0.09316, over 3862013.74 frames. ], batch size: 75, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:26:33,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1765180.0, ans=0.0 2024-08-12 18:26:38,471 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 18:26:43,705 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.522e+01 2.874e+01 3.178e+01 1.791e+02, threshold=5.747e+01, percent-clipped=2.0 2024-08-12 18:26:56,468 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.37 vs. limit=22.5 2024-08-12 18:26:58,242 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 18:27:06,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1765380.0, ans=0.2 2024-08-12 18:27:06,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1765380.0, ans=0.0 2024-08-12 18:27:13,670 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 27 from Vox, 19 fro AS 2024-08-12 18:27:25,514 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2650, loss[loss=0.1191, beats_loss=0.009181, ecapa_loss=0.0002427, whisper_loss=0.1075, over 21934.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01083, ecapa_loss=0.00017, whisper_loss=0.09199, over 3854752.69 frames. ], batch size: 92, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:27:29,748 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 18:27:36,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1765580.0, ans=0.0 2024-08-12 18:27:37,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1765680.0, ans=0.125 2024-08-12 18:27:40,064 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 18:27:41,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1765680.0, ans=0.95 2024-08-12 18:27:44,805 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.04 vs. limit=15.0 2024-08-12 18:28:01,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1765780.0, ans=0.0 2024-08-12 18:28:06,874 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-08-12 18:28:07,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1765880.0, ans=0.2 2024-08-12 18:28:19,539 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 14 from Vox, 49 fro AS 2024-08-12 18:28:23,148 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.95 vs. limit=15.0 2024-08-12 18:28:24,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1765980.0, ans=0.0 2024-08-12 18:28:28,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1765980.0, ans=0.1 2024-08-12 18:28:31,395 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2700, loss[loss=0.09997, beats_loss=0.01281, ecapa_loss=0.0001848, whisper_loss=0.08531, over 22381.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01094, ecapa_loss=0.0001711, whisper_loss=0.09178, over 3881142.50 frames. ], batch size: 94, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:28:33,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1766080.0, ans=0.2 2024-08-12 18:28:46,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1766180.0, ans=0.2 2024-08-12 18:28:47,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1766180.0, ans=0.05 2024-08-12 18:28:51,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1766180.0, ans=0.125 2024-08-12 18:28:54,736 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.345e+01 2.624e+01 3.036e+01 4.476e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-12 18:28:58,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1766280.0, ans=0.0 2024-08-12 18:29:26,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1766480.0, ans=0.125 2024-08-12 18:29:26,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1766480.0, ans=0.125 2024-08-12 18:29:28,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1766480.0, ans=0.0 2024-08-12 18:29:36,530 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2750, loss[loss=0.08244, beats_loss=0.01172, ecapa_loss=0.0001903, whisper_loss=0.06881, over 21434.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01096, ecapa_loss=0.0001703, whisper_loss=0.09077, over 3860662.76 frames. ], batch size: 92, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:29:54,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1766680.0, ans=0.125 2024-08-12 18:29:59,508 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 18:30:09,761 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2024-08-12 18:30:10,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1766780.0, ans=0.0 2024-08-12 18:30:11,574 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-12 18:30:42,384 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2800, loss[loss=0.09645, beats_loss=0.01218, ecapa_loss=0.0001775, whisper_loss=0.0825, over 22353.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01089, ecapa_loss=0.0001699, whisper_loss=0.09191, over 3894482.18 frames. ], batch size: 91, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:30:44,404 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-08-12 18:30:50,157 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 18:30:57,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1767180.0, ans=0.125 2024-08-12 18:31:06,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.517e+01 2.667e+01 2.964e+01 5.320e+01, threshold=5.335e+01, percent-clipped=1.0 2024-08-12 18:31:07,126 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2024-08-12 18:31:09,111 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 32 from Vox, 40 fro AS 2024-08-12 18:31:11,767 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-12 18:31:18,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1767280.0, ans=0.0 2024-08-12 18:31:21,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1767380.0, ans=0.125 2024-08-12 18:31:48,885 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2850, loss[loss=0.1042, beats_loss=0.009882, ecapa_loss=0.0001742, whisper_loss=0.09258, over 18102.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01094, ecapa_loss=0.0001705, whisper_loss=0.09239, over 3865546.36 frames. ], batch size: 72, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:31:49,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1767580.0, ans=0.1 2024-08-12 18:31:55,335 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 18:32:00,616 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 18:32:08,461 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 18:32:08,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1767680.0, ans=0.125 2024-08-12 18:32:23,565 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-08-12 18:32:24,284 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 18:32:28,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1767880.0, ans=0.125 2024-08-12 18:32:36,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1767880.0, ans=0.0 2024-08-12 18:32:53,815 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2900, loss[loss=0.1158, beats_loss=0.01207, ecapa_loss=0.0001402, whisper_loss=0.1023, over 22770.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.011, ecapa_loss=0.0001716, whisper_loss=0.09213, over 3861713.44 frames. ], batch size: 90, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:32:58,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1768080.0, ans=0.1 2024-08-12 18:33:11,382 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 18:33:14,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1768180.0, ans=0.125 2024-08-12 18:33:18,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1768180.0, ans=0.0 2024-08-12 18:33:19,019 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.482e+01 2.869e+01 3.422e+01 8.599e+01, threshold=5.738e+01, percent-clipped=1.0 2024-08-12 18:33:26,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1768280.0, ans=0.2 2024-08-12 18:33:31,889 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.212e+02 2024-08-12 18:33:35,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1768380.0, ans=0.0 2024-08-12 18:33:40,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1768380.0, ans=0.125 2024-08-12 18:33:47,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1768480.0, ans=0.0 2024-08-12 18:33:51,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1768480.0, ans=0.125 2024-08-12 18:33:56,390 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2024-08-12 18:34:00,684 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 2950, loss[loss=0.1278, beats_loss=0.009988, ecapa_loss=0.0001412, whisper_loss=0.1164, over 23977.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01101, ecapa_loss=0.0001732, whisper_loss=0.09207, over 3893841.84 frames. ], batch size: 86, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:34:15,340 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-12 18:34:23,433 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 18:34:37,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1768780.0, ans=0.125 2024-08-12 18:34:39,472 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 18:34:53,311 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 18:34:56,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1768980.0, ans=22.5 2024-08-12 18:35:10,468 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3000, loss[loss=0.1054, beats_loss=0.008396, ecapa_loss=0.0001829, whisper_loss=0.09521, over 14538.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01092, ecapa_loss=0.000173, whisper_loss=0.09317, over 3918077.08 frames. ], batch size: 55, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:35:10,469 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 18:35:44,275 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6098, 3.5072, 4.3300, 4.0986], device='cuda:1') 2024-08-12 18:35:46,422 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on ASR_libri: loss=0.2551, beats_loss=0, ecapa_loss=0.0005879, whisper_loss=0.2492, over 922467.00 frames. 2024-08-12 18:36:04,668 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on SV_voxceleb1: loss=0.004639, beats_loss=0, ecapa_loss=0.0004639, whisper_loss=0, over 939242.00 frames. 2024-08-12 18:37:44,631 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1273, 3.3365, 3.5014, 2.9696], device='cuda:1') 2024-08-12 18:37:53,553 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on AT_audioset: loss=0.02413, beats_loss=0.02413, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 18:37:53,559 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 18:37:56,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.57 vs. limit=6.0 2024-08-12 18:38:03,132 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:38:13,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2024-08-12 18:38:18,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.438e+01 2.713e+01 3.016e+01 4.001e+01, threshold=5.426e+01, percent-clipped=0.0 2024-08-12 18:38:26,532 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 18:38:37,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1769380.0, ans=0.125 2024-08-12 18:38:43,448 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=12.0 2024-08-12 18:38:44,196 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 18:38:47,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1769480.0, ans=0.1 2024-08-12 18:38:52,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1769480.0, ans=0.1 2024-08-12 18:38:59,980 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3050, loss[loss=0.1021, beats_loss=0.01247, ecapa_loss=0.0001679, whisper_loss=0.088, over 17544.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01098, ecapa_loss=0.0001725, whisper_loss=0.09296, over 3910159.39 frames. ], batch size: 71, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:39:00,406 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 18:39:05,122 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.57 vs. limit=22.5 2024-08-12 18:39:19,645 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:39:39,031 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 18:39:48,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1769880.0, ans=0.1 2024-08-12 18:39:57,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1769980.0, ans=0.1 2024-08-12 18:40:01,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1769980.0, ans=0.1 2024-08-12 18:40:07,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1769980.0, ans=0.0 2024-08-12 18:40:09,371 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3100, loss[loss=0.1033, beats_loss=0.008826, ecapa_loss=0.000174, whisper_loss=0.09276, over 18264.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01094, ecapa_loss=0.0001746, whisper_loss=0.0934, over 3922268.34 frames. ], batch size: 67, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:40:09,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1770080.0, ans=0.0 2024-08-12 18:40:11,025 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 18:40:19,569 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 30 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 18:40:19,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1770080.0, ans=0.125 2024-08-12 18:40:36,326 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.497e+01 2.868e+01 3.286e+01 7.289e+01, threshold=5.737e+01, percent-clipped=2.0 2024-08-12 18:40:53,293 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 18:41:07,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1770480.0, ans=0.1 2024-08-12 18:41:12,811 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 37 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 18:41:21,136 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3150, loss[loss=0.1097, beats_loss=0.009639, ecapa_loss=0.0002346, whisper_loss=0.09776, over 17178.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01105, ecapa_loss=0.0001739, whisper_loss=0.09231, over 3902382.79 frames. ], batch size: 71, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:41:32,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1770580.0, ans=0.0 2024-08-12 18:41:45,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1770680.0, ans=0.125 2024-08-12 18:41:48,229 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 18:41:49,986 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2024-08-12 18:42:24,285 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 18:42:24,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1770980.0, ans=0.1 2024-08-12 18:42:34,307 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3200, loss[loss=0.09105, beats_loss=0.009141, ecapa_loss=0.0002078, whisper_loss=0.07983, over 14687.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01098, ecapa_loss=0.0001747, whisper_loss=0.09277, over 3900665.03 frames. ], batch size: 59, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:42:50,135 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.98 vs. limit=12.0 2024-08-12 18:42:52,100 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=12.0 2024-08-12 18:43:02,848 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.434e+01 2.699e+01 3.191e+01 8.641e+01, threshold=5.397e+01, percent-clipped=3.0 2024-08-12 18:43:03,045 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 39 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 18:43:06,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1771280.0, ans=0.125 2024-08-12 18:43:46,065 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3250, loss[loss=0.1161, beats_loss=0.01131, ecapa_loss=0.0001789, whisper_loss=0.103, over 22825.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01093, ecapa_loss=0.000174, whisper_loss=0.09342, over 3904604.20 frames. ], batch size: 91, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:43:57,459 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 30 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 18:44:02,297 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.45 vs. limit=6.0 2024-08-12 18:44:03,652 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.81 vs. limit=22.5 2024-08-12 18:44:07,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1771680.0, ans=0.125 2024-08-12 18:44:16,315 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 15 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 18:44:26,946 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.52 vs. limit=10.0 2024-08-12 18:44:46,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1771980.0, ans=0.125 2024-08-12 18:44:58,878 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3300, loss[loss=0.08843, beats_loss=0.01262, ecapa_loss=0.0001768, whisper_loss=0.07404, over 16352.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01099, ecapa_loss=0.0001731, whisper_loss=0.09326, over 3904612.39 frames. ], batch size: 67, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:45:09,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1772080.0, ans=0.125 2024-08-12 18:45:12,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1772180.0, ans=0.2 2024-08-12 18:45:21,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1772180.0, ans=0.0 2024-08-12 18:45:26,622 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.521e+01 2.800e+01 3.274e+01 5.621e+01, threshold=5.601e+01, percent-clipped=1.0 2024-08-12 18:45:27,477 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-12 18:45:35,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1772280.0, ans=0.0 2024-08-12 18:45:44,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1772380.0, ans=0.125 2024-08-12 18:45:50,610 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 18:45:56,964 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 18:45:57,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1772480.0, ans=0.1 2024-08-12 18:45:57,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1772480.0, ans=0.125 2024-08-12 18:45:58,748 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-08-12 18:46:08,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1772480.0, ans=0.05 2024-08-12 18:46:11,238 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3350, loss[loss=0.09487, beats_loss=0.009574, ecapa_loss=0.0002093, whisper_loss=0.08321, over 18434.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01094, ecapa_loss=0.0001734, whisper_loss=0.09293, over 3907940.04 frames. ], batch size: 77, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:46:16,737 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2024-08-12 18:46:32,637 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 29 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 18:46:37,329 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-12 18:46:45,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1772780.0, ans=0.0 2024-08-12 18:46:48,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1772780.0, ans=0.0 2024-08-12 18:46:51,082 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 18:47:04,446 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 18:47:12,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.46 vs. limit=15.0 2024-08-12 18:47:22,793 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3400, loss[loss=0.1267, beats_loss=0.008351, ecapa_loss=0.0001691, whisper_loss=0.1167, over 22966.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01094, ecapa_loss=0.0001738, whisper_loss=0.09247, over 3912811.46 frames. ], batch size: 88, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:47:25,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2024-08-12 18:47:34,908 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:47:50,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.407e+01 2.669e+01 3.067e+01 7.735e+01, threshold=5.339e+01, percent-clipped=1.0 2024-08-12 18:47:51,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1773280.0, ans=0.1 2024-08-12 18:47:56,532 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 18:48:15,869 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.19 vs. limit=12.0 2024-08-12 18:48:28,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=1773480.0, ans=0.02 2024-08-12 18:48:36,958 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3450, loss[loss=0.1038, beats_loss=0.01155, ecapa_loss=0.0001823, whisper_loss=0.09037, over 22740.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0109, ecapa_loss=0.0001746, whisper_loss=0.0922, over 3912069.20 frames. ], batch size: 92, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:48:49,602 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=12.0 2024-08-12 18:48:56,953 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-12 18:49:02,353 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 34 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-12 18:49:24,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1773880.0, ans=0.05 2024-08-12 18:49:47,618 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3500, loss[loss=0.1062, beats_loss=0.01275, ecapa_loss=0.0001673, whisper_loss=0.09178, over 22242.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01089, ecapa_loss=0.0001754, whisper_loss=0.09241, over 3914913.06 frames. ], batch size: 92, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:49:48,560 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.21 vs. limit=22.5 2024-08-12 18:49:49,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1774080.0, ans=0.0 2024-08-12 18:49:49,771 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.04 vs. limit=15.0 2024-08-12 18:49:56,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.56 vs. limit=15.0 2024-08-12 18:50:03,555 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 18:50:14,606 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.564e+01 2.746e+01 3.042e+01 5.198e+01, threshold=5.491e+01, percent-clipped=0.0 2024-08-12 18:50:22,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1774280.0, ans=0.125 2024-08-12 18:50:25,824 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-12 18:50:41,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1774380.0, ans=0.125 2024-08-12 18:50:44,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1774480.0, ans=0.1 2024-08-12 18:50:46,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1774480.0, ans=0.05 2024-08-12 18:50:54,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1774480.0, ans=0.125 2024-08-12 18:50:58,432 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3550, loss[loss=0.1422, beats_loss=0.00698, ecapa_loss=0.000196, whisper_loss=0.1333, over 23489.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01086, ecapa_loss=0.000174, whisper_loss=0.09256, over 3893841.13 frames. ], batch size: 87, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:51:00,626 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 18:51:11,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1774680.0, ans=0.0 2024-08-12 18:51:17,334 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 18:51:30,265 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 18:51:36,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1774780.0, ans=10.0 2024-08-12 18:51:43,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1774880.0, ans=0.125 2024-08-12 18:51:52,656 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 36 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 18:51:58,303 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 14 from Vox, 49 fro AS 2024-08-12 18:51:58,823 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-12 18:52:01,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1774980.0, ans=0.1 2024-08-12 18:52:11,214 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3600, loss[loss=0.09257, beats_loss=0.01053, ecapa_loss=0.0001727, whisper_loss=0.08031, over 13655.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01096, ecapa_loss=0.0001717, whisper_loss=0.09127, over 3887829.94 frames. ], batch size: 54, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:52:22,583 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 18:52:22,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1775080.0, ans=0.125 2024-08-12 18:52:22,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1775080.0, ans=0.0 2024-08-12 18:52:38,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.432e+01 2.743e+01 3.098e+01 5.002e+01, threshold=5.485e+01, percent-clipped=0.0 2024-08-12 18:52:59,177 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 18:53:03,256 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 18:53:03,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1775380.0, ans=0.125 2024-08-12 18:53:23,535 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3650, loss[loss=0.1116, beats_loss=0.01056, ecapa_loss=0.0001785, whisper_loss=0.0993, over 14454.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01096, ecapa_loss=0.0001714, whisper_loss=0.09073, over 3868540.86 frames. ], batch size: 60, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:54:06,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1775880.0, ans=0.1 2024-08-12 18:54:30,410 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.209e-02 2024-08-12 18:54:31,493 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 18:54:36,028 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3700, loss[loss=0.1093, beats_loss=0.01037, ecapa_loss=0.0001606, whisper_loss=0.09736, over 21486.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01097, ecapa_loss=0.0001713, whisper_loss=0.09105, over 3861219.04 frames. ], batch size: 88, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:55:00,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1776180.0, ans=0.1 2024-08-12 18:55:03,070 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.392e+01 2.654e+01 3.110e+01 5.350e+01, threshold=5.308e+01, percent-clipped=0.0 2024-08-12 18:55:12,587 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 27 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 18:55:13,995 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 18:55:14,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1776280.0, ans=0.0 2024-08-12 18:55:19,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1776380.0, ans=0.125 2024-08-12 18:55:45,690 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 18:55:47,002 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 18:55:47,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1776580.0, ans=0.0 2024-08-12 18:55:48,270 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3750, loss[loss=0.1077, beats_loss=0.01188, ecapa_loss=0.0001541, whisper_loss=0.09432, over 19729.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01095, ecapa_loss=0.0001724, whisper_loss=0.09144, over 3860945.47 frames. ], batch size: 76, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:55:51,310 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 18:55:55,941 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 18:56:01,751 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:56:01,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1776680.0, ans=0.2 2024-08-12 18:56:18,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1776780.0, ans=0.1 2024-08-12 18:56:44,676 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 18:56:45,263 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=12.0 2024-08-12 18:56:49,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1776980.0, ans=0.125 2024-08-12 18:56:55,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1776980.0, ans=0.125 2024-08-12 18:56:57,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1776980.0, ans=0.125 2024-08-12 18:57:03,293 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3800, loss[loss=0.0977, beats_loss=0.01003, ecapa_loss=0.000132, whisper_loss=0.08635, over 23402.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01095, ecapa_loss=0.0001726, whisper_loss=0.09226, over 3877955.12 frames. ], batch size: 88, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:57:06,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1777080.0, ans=0.125 2024-08-12 18:57:11,927 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-12 18:57:27,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1777180.0, ans=0.0 2024-08-12 18:57:27,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1777180.0, ans=0.125 2024-08-12 18:57:31,837 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.486e+01 2.799e+01 3.183e+01 6.177e+01, threshold=5.598e+01, percent-clipped=1.0 2024-08-12 18:57:38,613 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.31 vs. limit=22.5 2024-08-12 18:57:45,638 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 18:57:55,898 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 11 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 18:58:04,177 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 18:58:15,718 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 24 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 18:58:19,767 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3850, loss[loss=0.1201, beats_loss=0.008998, ecapa_loss=0.0001849, whisper_loss=0.1092, over 22120.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01093, ecapa_loss=0.0001734, whisper_loss=0.09148, over 3838298.61 frames. ], batch size: 86, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:58:19,950 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 18:58:38,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1777680.0, ans=0.1 2024-08-12 18:58:43,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1777680.0, ans=0.125 2024-08-12 18:59:14,786 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 18:59:17,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-08-12 18:59:35,694 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=12.0 2024-08-12 18:59:36,379 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3900, loss[loss=0.06524, beats_loss=0.01346, ecapa_loss=0.0001377, whisper_loss=0.0504, over 14425.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001735, whisper_loss=0.09149, over 3804682.56 frames. ], batch size: 55, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:59:56,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1778180.0, ans=0.125 2024-08-12 19:00:00,945 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 19:00:05,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.460e+01 2.720e+01 3.134e+01 5.284e+01, threshold=5.440e+01, percent-clipped=0.0 2024-08-12 19:00:05,482 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 19:00:06,980 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 19:00:13,303 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 19:00:32,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1778380.0, ans=0.125 2024-08-12 19:00:38,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1778480.0, ans=0.125 2024-08-12 19:00:40,400 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.04 vs. limit=10.0 2024-08-12 19:00:53,601 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 3950, loss[loss=0.1173, beats_loss=0.01005, ecapa_loss=0.0001805, whisper_loss=0.1054, over 22263.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01082, ecapa_loss=0.000175, whisper_loss=0.09205, over 3843705.01 frames. ], batch size: 88, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:00:56,015 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2024-08-12 19:00:57,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1778580.0, ans=0.125 2024-08-12 19:01:00,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1778580.0, ans=0.2 2024-08-12 19:01:04,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=12.0 2024-08-12 19:01:07,024 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.67 vs. limit=15.0 2024-08-12 19:01:10,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1778680.0, ans=0.125 2024-08-12 19:01:12,850 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2024-08-12 19:01:21,352 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 19:01:26,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1778780.0, ans=0.2 2024-08-12 19:01:32,329 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 19:01:35,323 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 19:01:45,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1778880.0, ans=0.125 2024-08-12 19:01:48,053 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 35 from LS+wenet, 11 from Vox, 43 fro AS 2024-08-12 19:01:54,483 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-08-12 19:02:04,458 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 19:02:08,786 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4000, loss[loss=0.08386, beats_loss=0.01153, ecapa_loss=0.0002056, whisper_loss=0.07028, over 20592.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01092, ecapa_loss=0.0001731, whisper_loss=0.09169, over 3843418.02 frames. ], batch size: 88, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:02:20,741 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2024-08-12 19:02:37,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1779180.0, ans=0.2 2024-08-12 19:02:39,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.414e+01 2.670e+01 2.988e+01 4.666e+01, threshold=5.339e+01, percent-clipped=0.0 2024-08-12 19:02:41,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1779280.0, ans=0.125 2024-08-12 19:02:45,876 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 37 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-12 19:02:55,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1779380.0, ans=0.0 2024-08-12 19:03:00,091 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.370e-02 2024-08-12 19:03:06,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1779380.0, ans=0.0 2024-08-12 19:03:23,945 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 19:03:29,167 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4050, loss[loss=0.09612, beats_loss=0.01117, ecapa_loss=0.0002064, whisper_loss=0.08289, over 16295.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.0001729, whisper_loss=0.09135, over 3819238.80 frames. ], batch size: 67, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:03:47,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1779680.0, ans=0.125 2024-08-12 19:04:03,002 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-12 19:04:10,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1779780.0, ans=0.125 2024-08-12 19:04:10,456 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-08-12 19:04:11,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1779780.0, ans=0.0 2024-08-12 19:04:20,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1779880.0, ans=0.1 2024-08-12 19:04:22,647 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-12 19:04:25,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=1779880.0, ans=0.02 2024-08-12 19:04:28,505 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 19:04:48,255 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4100, loss[loss=0.1227, beats_loss=0.01164, ecapa_loss=0.0001866, whisper_loss=0.1092, over 23061.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01089, ecapa_loss=0.0001729, whisper_loss=0.09211, over 3853184.88 frames. ], batch size: 93, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:04:51,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1780080.0, ans=0.0 2024-08-12 19:04:55,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1780080.0, ans=0.1 2024-08-12 19:04:55,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1780080.0, ans=0.125 2024-08-12 19:04:59,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1780080.0, ans=0.125 2024-08-12 19:05:02,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1780180.0, ans=0.125 2024-08-12 19:05:16,936 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.487e+01 2.905e+01 3.188e+01 5.523e+01, threshold=5.810e+01, percent-clipped=1.0 2024-08-12 19:05:28,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1780280.0, ans=0.07 2024-08-12 19:05:47,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1780380.0, ans=0.0 2024-08-12 19:05:53,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1780480.0, ans=0.125 2024-08-12 19:06:07,562 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4150, loss[loss=0.1032, beats_loss=0.00984, ecapa_loss=0.000208, whisper_loss=0.09129, over 21446.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001746, whisper_loss=0.09183, over 3857729.47 frames. ], batch size: 90, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:06:21,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1780680.0, ans=0.125 2024-08-12 19:06:23,583 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-08-12 19:06:35,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1780680.0, ans=0.125 2024-08-12 19:06:49,993 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 19:06:59,531 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2024-08-12 19:07:16,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1780980.0, ans=0.1 2024-08-12 19:07:18,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1780980.0, ans=0.125 2024-08-12 19:07:24,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1780980.0, ans=15.0 2024-08-12 19:07:26,777 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4200, loss[loss=0.1104, beats_loss=0.008059, ecapa_loss=0.0002213, whisper_loss=0.1002, over 18465.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01094, ecapa_loss=0.0001747, whisper_loss=0.09199, over 3869270.84 frames. ], batch size: 75, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:07:28,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1781080.0, ans=0.125 2024-08-12 19:07:38,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1781080.0, ans=0.0 2024-08-12 19:07:55,602 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.05 vs. limit=10.0 2024-08-12 19:07:56,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.440e+01 2.909e+01 3.594e+01 1.116e+02, threshold=5.819e+01, percent-clipped=3.0 2024-08-12 19:07:57,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1781280.0, ans=0.0 2024-08-12 19:08:11,279 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 19:08:33,341 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 30 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-12 19:08:49,004 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4250, loss[loss=0.1098, beats_loss=0.007767, ecapa_loss=0.000177, whisper_loss=0.1003, over 18393.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01097, ecapa_loss=0.0001737, whisper_loss=0.09128, over 3860434.68 frames. ], batch size: 73, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:08:58,619 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 27 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-12 19:09:10,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2024-08-12 19:09:10,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1781680.0, ans=0.125 2024-08-12 19:09:35,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1781880.0, ans=0.0 2024-08-12 19:09:40,037 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 19:10:02,380 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 19:10:08,385 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4300, loss[loss=0.1117, beats_loss=0.01186, ecapa_loss=0.0001717, whisper_loss=0.09809, over 23889.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01094, ecapa_loss=0.0001736, whisper_loss=0.09176, over 3851740.95 frames. ], batch size: 94, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:10:08,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1782080.0, ans=0.1 2024-08-12 19:10:11,688 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 14 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 19:10:13,866 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-12 19:10:19,754 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 19:10:30,683 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 19:10:37,759 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.369e+01 2.676e+01 2.998e+01 4.612e+01, threshold=5.352e+01, percent-clipped=0.0 2024-08-12 19:10:42,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1782280.0, ans=0.125 2024-08-12 19:10:43,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1782280.0, ans=0.04949747468305833 2024-08-12 19:10:52,364 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2024-08-12 19:10:57,962 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 32 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 19:11:06,319 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 19:11:11,297 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-12 19:11:22,021 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 19:11:25,680 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 19:11:27,234 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4350, loss[loss=0.1006, beats_loss=0.009672, ecapa_loss=0.000233, whisper_loss=0.08858, over 13491.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01089, ecapa_loss=0.0001739, whisper_loss=0.09187, over 3826755.91 frames. ], batch size: 57, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:11:32,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1782580.0, ans=0.0 2024-08-12 19:11:35,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-08-12 19:11:42,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1782680.0, ans=0.0 2024-08-12 19:12:16,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1782880.0, ans=0.125 2024-08-12 19:12:21,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-12 19:12:21,310 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2024-08-12 19:12:30,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1782980.0, ans=0.2 2024-08-12 19:12:41,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1782980.0, ans=0.1 2024-08-12 19:12:48,337 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 19:12:49,810 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4400, loss[loss=0.09608, beats_loss=0.0135, ecapa_loss=0.0001802, whisper_loss=0.08078, over 20687.00 frames. ], tot_loss[loss=0.104, beats_loss=0.011, ecapa_loss=0.0001734, whisper_loss=0.09125, over 3836755.08 frames. ], batch size: 87, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:12:52,749 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 19:13:06,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1783180.0, ans=0.125 2024-08-12 19:13:06,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1783180.0, ans=0.125 2024-08-12 19:13:06,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1783180.0, ans=0.125 2024-08-12 19:13:21,574 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.411e+01 2.660e+01 2.962e+01 4.713e+01, threshold=5.320e+01, percent-clipped=0.0 2024-08-12 19:13:30,108 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 19:13:44,656 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2024-08-12 19:13:45,981 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 19:13:50,939 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 19:14:13,481 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4450, loss[loss=0.1234, beats_loss=0.008497, ecapa_loss=0.0001898, whisper_loss=0.1131, over 19401.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01096, ecapa_loss=0.000173, whisper_loss=0.09128, over 3857031.53 frames. ], batch size: 76, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:14:27,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1783580.0, ans=0.125 2024-08-12 19:14:48,367 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2024-08-12 19:15:11,568 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2024-08-12 19:15:27,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1783980.0, ans=0.125 2024-08-12 19:15:31,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1783980.0, ans=0.125 2024-08-12 19:15:33,019 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 19:15:35,908 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 19:15:41,538 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4500, loss[loss=0.1011, beats_loss=0.009363, ecapa_loss=0.0001913, whisper_loss=0.08987, over 18113.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01097, ecapa_loss=0.0001727, whisper_loss=0.09188, over 3863998.35 frames. ], batch size: 76, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:15:56,043 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 19:16:01,237 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 19:16:03,197 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-08-12 19:16:13,382 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.482e+01 2.920e+01 3.537e+01 6.104e+01, threshold=5.841e+01, percent-clipped=3.0 2024-08-12 19:16:20,480 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 19:16:29,318 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 19:16:39,952 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 19:16:47,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1784480.0, ans=0.125 2024-08-12 19:16:48,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1784480.0, ans=0.0 2024-08-12 19:16:49,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2024-08-12 19:16:54,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1784480.0, ans=0.07 2024-08-12 19:16:54,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1784480.0, ans=0.2 2024-08-12 19:16:58,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1784480.0, ans=0.2 2024-08-12 19:17:00,894 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-12 19:17:04,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1784480.0, ans=0.125 2024-08-12 19:17:07,603 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4550, loss[loss=0.09137, beats_loss=0.01021, ecapa_loss=0.0001649, whisper_loss=0.07951, over 18222.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01098, ecapa_loss=0.0001732, whisper_loss=0.09171, over 3885224.82 frames. ], batch size: 68, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:17:20,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1784580.0, ans=0.125 2024-08-12 19:17:43,172 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 24 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-12 19:17:46,511 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.91 vs. limit=22.5 2024-08-12 19:17:50,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1784780.0, ans=0.125 2024-08-12 19:17:56,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1784780.0, ans=0.125 2024-08-12 19:17:58,183 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-12 19:18:33,612 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4600, loss[loss=0.1133, beats_loss=0.01044, ecapa_loss=0.0001914, whisper_loss=0.1009, over 16528.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01101, ecapa_loss=0.0001727, whisper_loss=0.09175, over 3883726.17 frames. ], batch size: 68, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:18:43,750 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 19:19:04,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.452e+01 2.765e+01 3.164e+01 4.953e+01, threshold=5.531e+01, percent-clipped=0.0 2024-08-12 19:19:09,438 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.95 vs. limit=12.0 2024-08-12 19:19:29,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1785380.0, ans=0.1 2024-08-12 19:19:35,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1785480.0, ans=0.0 2024-08-12 19:19:44,597 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2024-08-12 19:19:52,106 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4650, loss[loss=0.1083, beats_loss=0.009748, ecapa_loss=0.0002002, whisper_loss=0.09657, over 23103.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01103, ecapa_loss=0.0001723, whisper_loss=0.09133, over 3892107.24 frames. ], batch size: 94, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:20:01,985 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 19:20:04,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1785580.0, ans=0.125 2024-08-12 19:20:05,873 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 31 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 19:20:07,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1785680.0, ans=0.1 2024-08-12 19:20:08,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1785680.0, ans=0.0 2024-08-12 19:20:22,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1785680.0, ans=0.0 2024-08-12 19:20:43,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1785880.0, ans=0.125 2024-08-12 19:21:12,677 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4700, loss[loss=0.1022, beats_loss=0.01207, ecapa_loss=0.0001923, whisper_loss=0.08824, over 16638.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01101, ecapa_loss=0.000173, whisper_loss=0.09159, over 3887940.00 frames. ], batch size: 68, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:21:13,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1786080.0, ans=0.1 2024-08-12 19:21:21,657 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 19:21:43,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.537e+01 2.789e+01 3.116e+01 4.712e+01, threshold=5.578e+01, percent-clipped=0.0 2024-08-12 19:21:43,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1786280.0, ans=0.0 2024-08-12 19:22:09,812 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 19:22:16,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1786480.0, ans=0.125 2024-08-12 19:22:16,783 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.72 vs. limit=15.0 2024-08-12 19:22:27,066 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 19:22:32,902 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4750, loss[loss=0.1098, beats_loss=0.01022, ecapa_loss=0.0001528, whisper_loss=0.09804, over 19416.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01096, ecapa_loss=0.0001736, whisper_loss=0.09112, over 3887217.10 frames. ], batch size: 76, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:22:37,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1786580.0, ans=0.0 2024-08-12 19:22:38,576 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 19:22:46,478 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 19:22:50,943 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.20 vs. limit=15.0 2024-08-12 19:23:17,030 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-08-12 19:23:17,539 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 19:23:24,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1786880.0, ans=0.0 2024-08-12 19:23:38,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1786980.0, ans=0.125 2024-08-12 19:23:50,884 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4800, loss[loss=0.1152, beats_loss=0.008855, ecapa_loss=0.0002063, whisper_loss=0.1043, over 21574.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01093, ecapa_loss=0.0001743, whisper_loss=0.09143, over 3886421.75 frames. ], batch size: 89, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:23:59,105 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 19:24:00,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1787080.0, ans=0.125 2024-08-12 19:24:06,101 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=15.0 2024-08-12 19:24:07,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1787180.0, ans=0.0 2024-08-12 19:24:11,706 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2024-08-12 19:24:20,411 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.537e+01 2.789e+01 3.212e+01 6.421e+01, threshold=5.577e+01, percent-clipped=1.0 2024-08-12 19:24:24,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1787280.0, ans=0.2 2024-08-12 19:24:30,056 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 19:24:32,146 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 19:24:46,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1787380.0, ans=0.0 2024-08-12 19:25:10,353 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4850, loss[loss=0.07216, beats_loss=0.009978, ecapa_loss=0.0001684, whisper_loss=0.0605, over 17143.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01098, ecapa_loss=0.0001745, whisper_loss=0.09118, over 3895168.36 frames. ], batch size: 67, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:25:35,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1787680.0, ans=0.0 2024-08-12 19:25:37,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1787680.0, ans=0.0 2024-08-12 19:25:45,089 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 19:25:45,561 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-12 19:25:57,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1787780.0, ans=0.035 2024-08-12 19:26:29,063 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 21 from LS+wenet, 32 from Vox, 41 fro AS 2024-08-12 19:26:32,723 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 19:26:35,131 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4900, loss[loss=0.09707, beats_loss=0.0117, ecapa_loss=0.0001516, whisper_loss=0.08386, over 22279.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01098, ecapa_loss=0.0001737, whisper_loss=0.09111, over 3900591.73 frames. ], batch size: 89, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:26:38,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1788080.0, ans=0.125 2024-08-12 19:26:48,785 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 19:26:57,042 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 19:27:06,396 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.493e+01 2.714e+01 3.066e+01 4.979e+01, threshold=5.428e+01, percent-clipped=0.0 2024-08-12 19:27:09,699 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 19:27:24,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1788380.0, ans=0.0 2024-08-12 19:27:27,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1788380.0, ans=0.0 2024-08-12 19:27:27,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1788380.0, ans=0.0 2024-08-12 19:27:27,881 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2024-08-12 19:27:29,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1788380.0, ans=0.125 2024-08-12 19:27:54,126 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2024-08-12 19:27:56,330 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 4950, loss[loss=0.1051, beats_loss=0.01105, ecapa_loss=0.0001916, whisper_loss=0.09211, over 18774.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01094, ecapa_loss=0.0001744, whisper_loss=0.09126, over 3890766.57 frames. ], batch size: 77, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:27:56,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1788580.0, ans=0.125 2024-08-12 19:28:07,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1788580.0, ans=0.125 2024-08-12 19:28:12,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1788680.0, ans=0.07 2024-08-12 19:28:18,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1788680.0, ans=0.125 2024-08-12 19:28:45,599 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 19:28:47,547 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2024-08-12 19:28:49,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1788880.0, ans=0.0 2024-08-12 19:29:00,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1788980.0, ans=0.95 2024-08-12 19:29:06,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1788980.0, ans=0.125 2024-08-12 19:29:15,170 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=12.0 2024-08-12 19:29:15,517 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5000, loss[loss=0.09721, beats_loss=0.01268, ecapa_loss=0.0001384, whisper_loss=0.08314, over 15368.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01097, ecapa_loss=0.0001745, whisper_loss=0.09147, over 3881373.18 frames. ], batch size: 59, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:29:29,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1789080.0, ans=0.0 2024-08-12 19:29:40,985 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 20 from LS+wenet, 21 from Vox, 51 fro AS 2024-08-12 19:29:47,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.488e+01 2.839e+01 3.204e+01 5.431e+01, threshold=5.678e+01, percent-clipped=1.0 2024-08-12 19:29:52,177 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 19:30:00,592 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 19:30:04,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1789280.0, ans=0.125 2024-08-12 19:30:10,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1789380.0, ans=0.125 2024-08-12 19:30:31,761 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=15.0 2024-08-12 19:30:38,165 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5050, loss[loss=0.101, beats_loss=0.01106, ecapa_loss=0.0001579, whisper_loss=0.08832, over 16105.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0111, ecapa_loss=0.0001734, whisper_loss=0.09111, over 3868475.33 frames. ], batch size: 62, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:31:16,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1789780.0, ans=0.2 2024-08-12 19:31:23,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1789880.0, ans=0.125 2024-08-12 19:31:46,674 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 12 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 19:31:54,009 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5100, loss[loss=0.1066, beats_loss=0.009419, ecapa_loss=0.000203, whisper_loss=0.09514, over 18417.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.000173, whisper_loss=0.09151, over 3862283.32 frames. ], batch size: 74, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:32:02,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2024-08-12 19:32:10,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1790180.0, ans=0.125 2024-08-12 19:32:15,721 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 19:32:20,083 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.471e+01 2.779e+01 3.135e+01 9.153e+01, threshold=5.559e+01, percent-clipped=1.0 2024-08-12 19:32:35,286 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 19:32:47,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1790380.0, ans=0.2 2024-08-12 19:32:49,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1790480.0, ans=0.125 2024-08-12 19:33:03,004 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5150, loss[loss=0.06683, beats_loss=0.01323, ecapa_loss=0.0001557, whisper_loss=0.05204, over 13550.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01109, ecapa_loss=0.0001723, whisper_loss=0.0916, over 3876732.34 frames. ], batch size: 55, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:33:12,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1790580.0, ans=0.0 2024-08-12 19:33:19,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1790680.0, ans=0.0 2024-08-12 19:33:26,586 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=15.0 2024-08-12 19:33:38,074 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 19:33:39,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1790780.0, ans=0.125 2024-08-12 19:33:40,652 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 19:33:42,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1790880.0, ans=0.125 2024-08-12 19:33:55,129 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-12 19:34:10,406 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5200, loss[loss=0.09403, beats_loss=0.01002, ecapa_loss=0.0001755, whisper_loss=0.08225, over 18607.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01097, ecapa_loss=0.0001723, whisper_loss=0.09166, over 3850375.68 frames. ], batch size: 74, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:34:11,974 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 19:34:15,979 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 19:34:19,897 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 19:34:28,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1791180.0, ans=0.0 2024-08-12 19:34:36,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.499e+01 2.713e+01 3.001e+01 1.517e+02, threshold=5.426e+01, percent-clipped=1.0 2024-08-12 19:34:38,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1791280.0, ans=0.125 2024-08-12 19:34:48,184 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.12 vs. limit=12.0 2024-08-12 19:34:56,250 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 19:35:19,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2024-08-12 19:35:19,371 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5250, loss[loss=0.128, beats_loss=0.007466, ecapa_loss=0.0001893, whisper_loss=0.1186, over 16015.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01106, ecapa_loss=0.000172, whisper_loss=0.09105, over 3845120.99 frames. ], batch size: 62, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:35:19,588 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 19:35:26,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1791580.0, ans=0.125 2024-08-12 19:35:34,820 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 19:35:40,197 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 19:35:49,704 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 24 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-12 19:35:57,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1791780.0, ans=0.1 2024-08-12 19:36:20,078 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2024-08-12 19:36:28,602 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5300, loss[loss=0.1348, beats_loss=0.007153, ecapa_loss=0.00021, whisper_loss=0.1256, over 22460.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.011, ecapa_loss=0.0001712, whisper_loss=0.09158, over 3886231.59 frames. ], batch size: 88, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:36:31,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1792080.0, ans=0.2 2024-08-12 19:36:34,090 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 19:36:38,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1792080.0, ans=0.125 2024-08-12 19:36:45,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1792180.0, ans=0.125 2024-08-12 19:36:54,229 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.416e+01 2.797e+01 3.236e+01 7.041e+01, threshold=5.594e+01, percent-clipped=1.0 2024-08-12 19:36:59,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1792280.0, ans=0.125 2024-08-12 19:37:10,470 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 19:37:24,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1792480.0, ans=0.04949747468305833 2024-08-12 19:37:35,827 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5350, loss[loss=0.1139, beats_loss=0.009503, ecapa_loss=0.0001994, whisper_loss=0.1024, over 20563.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01099, ecapa_loss=0.0001713, whisper_loss=0.09203, over 3868000.14 frames. ], batch size: 82, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:38:06,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1792780.0, ans=0.125 2024-08-12 19:38:09,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1792780.0, ans=0.1 2024-08-12 19:38:27,883 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 19:38:44,070 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5400, loss[loss=0.1249, beats_loss=0.01078, ecapa_loss=0.0001699, whisper_loss=0.1125, over 21949.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01087, ecapa_loss=0.0001725, whisper_loss=0.09274, over 3866711.80 frames. ], batch size: 87, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:38:48,292 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 19:39:09,888 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.477e+01 2.760e+01 3.199e+01 8.149e+01, threshold=5.520e+01, percent-clipped=2.0 2024-08-12 19:39:10,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1793280.0, ans=0.125 2024-08-12 19:39:36,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1793380.0, ans=0.1 2024-08-12 19:39:37,659 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.310e-03 2024-08-12 19:39:53,580 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5450, loss[loss=0.09222, beats_loss=0.01179, ecapa_loss=0.000206, whisper_loss=0.07836, over 12812.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01085, ecapa_loss=0.0001728, whisper_loss=0.09272, over 3867166.77 frames. ], batch size: 55, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:39:55,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1793580.0, ans=0.125 2024-08-12 19:39:55,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1793580.0, ans=0.125 2024-08-12 19:40:19,046 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 19 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-12 19:40:25,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1793780.0, ans=0.125 2024-08-12 19:40:41,735 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 19:40:50,825 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2024-08-12 19:41:00,844 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-12 19:41:03,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1793980.0, ans=0.1 2024-08-12 19:41:05,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1794080.0, ans=0.125 2024-08-12 19:41:06,638 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5500, loss[loss=0.07217, beats_loss=0.01325, ecapa_loss=0.0001916, whisper_loss=0.05701, over 21248.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01079, ecapa_loss=0.0001728, whisper_loss=0.09305, over 3899500.45 frames. ], batch size: 89, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:41:08,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1794080.0, ans=0.2 2024-08-12 19:41:11,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1794080.0, ans=0.1 2024-08-12 19:41:28,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1794180.0, ans=0.0 2024-08-12 19:41:32,265 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.427e+01 2.827e+01 3.059e+01 4.853e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 19:41:48,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1794280.0, ans=0.125 2024-08-12 19:41:49,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1794380.0, ans=0.125 2024-08-12 19:41:54,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1794380.0, ans=0.0 2024-08-12 19:42:15,954 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2024-08-12 19:42:24,263 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5550, loss[loss=0.09891, beats_loss=0.01105, ecapa_loss=0.0002202, whisper_loss=0.08565, over 17257.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01082, ecapa_loss=0.0001718, whisper_loss=0.09332, over 3914954.58 frames. ], batch size: 73, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:42:38,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1794580.0, ans=0.1 2024-08-12 19:42:40,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1794680.0, ans=0.04949747468305833 2024-08-12 19:43:05,439 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 19:43:07,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1794780.0, ans=0.125 2024-08-12 19:43:30,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1794880.0, ans=0.0 2024-08-12 19:43:32,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1794880.0, ans=0.125 2024-08-12 19:43:44,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1794980.0, ans=0.125 2024-08-12 19:43:45,993 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 19:43:47,310 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-12 19:43:49,786 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5600, loss[loss=0.09837, beats_loss=0.01152, ecapa_loss=0.0001246, whisper_loss=0.0856, over 19282.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01091, ecapa_loss=0.0001705, whisper_loss=0.09311, over 3912195.24 frames. ], batch size: 75, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:43:51,812 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.47 vs. limit=10.0 2024-08-12 19:44:05,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1795180.0, ans=0.1 2024-08-12 19:44:07,775 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 14 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 19:44:24,202 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.502e+01 2.768e+01 3.142e+01 4.658e+01, threshold=5.536e+01, percent-clipped=0.0 2024-08-12 19:44:38,923 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.01 vs. limit=15.0 2024-08-12 19:45:22,779 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5650, loss[loss=0.1061, beats_loss=0.01227, ecapa_loss=0.0001457, whisper_loss=0.09234, over 18942.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01102, ecapa_loss=0.0001708, whisper_loss=0.09204, over 3938672.31 frames. ], batch size: 74, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:45:23,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1795580.0, ans=10.0 2024-08-12 19:45:31,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1795580.0, ans=0.0 2024-08-12 19:45:45,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1795680.0, ans=0.09899494936611666 2024-08-12 19:46:27,811 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.436e-01 2024-08-12 19:46:56,997 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5700, loss[loss=0.09607, beats_loss=0.01215, ecapa_loss=0.0001802, whisper_loss=0.08211, over 21410.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.0001717, whisper_loss=0.09188, over 3950914.84 frames. ], batch size: 89, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:47:24,137 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 19:47:33,426 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.533e+01 2.876e+01 3.216e+01 4.377e+01, threshold=5.753e+01, percent-clipped=0.0 2024-08-12 19:47:37,085 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=12.0 2024-08-12 19:47:48,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1796280.0, ans=0.125 2024-08-12 19:47:50,516 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-12 19:47:54,489 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.896e+00 2024-08-12 19:48:20,048 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 19:48:20,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1796480.0, ans=0.1 2024-08-12 19:48:30,909 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5750, loss[loss=0.1202, beats_loss=0.009928, ecapa_loss=0.0002033, whisper_loss=0.1082, over 15920.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01098, ecapa_loss=0.000172, whisper_loss=0.09201, over 3950851.46 frames. ], batch size: 61, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:49:10,863 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 19:49:36,904 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 39 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 19:49:47,296 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 19:50:01,506 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5800, loss[loss=0.1091, beats_loss=0.008841, ecapa_loss=0.000151, whisper_loss=0.09876, over 21043.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0109, ecapa_loss=0.0001723, whisper_loss=0.09225, over 3927745.81 frames. ], batch size: 78, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:50:04,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1797080.0, ans=0.0 2024-08-12 19:50:05,856 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 20 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-12 19:50:12,669 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 19:50:20,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1797180.0, ans=0.0 2024-08-12 19:50:26,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=12.0 2024-08-12 19:50:27,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.440e+01 2.724e+01 3.167e+01 6.575e+01, threshold=5.447e+01, percent-clipped=2.0 2024-08-12 19:50:30,690 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 19:50:40,676 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.92 vs. limit=15.0 2024-08-12 19:50:50,379 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=15.0 2024-08-12 19:50:51,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1797380.0, ans=0.2 2024-08-12 19:51:14,161 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5850, loss[loss=0.0779, beats_loss=0.01209, ecapa_loss=0.0001188, whisper_loss=0.06462, over 15644.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01099, ecapa_loss=0.0001707, whisper_loss=0.09207, over 3931108.50 frames. ], batch size: 60, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:52:03,510 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 19:52:03,882 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 19:52:26,302 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5900, loss[loss=0.09957, beats_loss=0.01044, ecapa_loss=0.0001896, whisper_loss=0.08723, over 14347.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01107, ecapa_loss=0.0001709, whisper_loss=0.09138, over 3904347.40 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:52:26,511 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 19:52:38,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1798080.0, ans=0.125 2024-08-12 19:52:48,145 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 19:52:54,635 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.654e+01 2.967e+01 3.336e+01 4.788e+01, threshold=5.934e+01, percent-clipped=0.0 2024-08-12 19:52:57,549 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 19:53:03,456 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 19:53:20,676 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-12 19:53:21,450 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.34 vs. limit=22.5 2024-08-12 19:53:36,217 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-08-12 19:53:38,607 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 5950, loss[loss=0.098, beats_loss=0.01233, ecapa_loss=0.0001778, whisper_loss=0.08389, over 22836.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01104, ecapa_loss=0.0001719, whisper_loss=0.09061, over 3888883.44 frames. ], batch size: 92, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:53:40,430 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-12 19:53:55,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-12 19:53:56,438 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 15 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 19:54:14,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1798780.0, ans=0.125 2024-08-12 19:54:16,846 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2024-08-12 19:54:18,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1798780.0, ans=0.0 2024-08-12 19:54:34,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1798880.0, ans=0.0 2024-08-12 19:54:55,086 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6000, loss[loss=0.09087, beats_loss=0.0119, ecapa_loss=0.0001604, whisper_loss=0.07736, over 17006.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01107, ecapa_loss=0.0001714, whisper_loss=0.0907, over 3893935.77 frames. ], batch size: 69, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:54:55,087 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 19:55:33,582 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on ASR_libri: loss=0.2545, beats_loss=0, ecapa_loss=0.0005899, whisper_loss=0.2486, over 922467.00 frames. 2024-08-12 19:55:50,021 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on SV_voxceleb1: loss=0.004696, beats_loss=0, ecapa_loss=0.0004696, whisper_loss=0, over 939242.00 frames. 2024-08-12 19:57:46,517 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on AT_audioset: loss=0.02428, beats_loss=0.02428, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 19:57:46,521 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 19:57:47,974 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 19:58:16,055 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.501e+01 2.791e+01 3.141e+01 5.827e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-12 19:58:21,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1799280.0, ans=0.125 2024-08-12 19:58:30,203 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.20 vs. limit=8.0 2024-08-12 19:58:42,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1799380.0, ans=0.5 2024-08-12 19:58:53,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.73 vs. limit=22.5 2024-08-12 19:58:54,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1799480.0, ans=0.125 2024-08-12 19:58:55,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1799480.0, ans=0.125 2024-08-12 19:58:59,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1799480.0, ans=0.125 2024-08-12 19:59:01,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1799480.0, ans=0.0 2024-08-12 19:59:04,320 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6050, loss[loss=0.1007, beats_loss=0.01412, ecapa_loss=0.0001385, whisper_loss=0.08514, over 21677.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.011, ecapa_loss=0.0001714, whisper_loss=0.0912, over 3867196.61 frames. ], batch size: 83, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:59:06,672 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 19:59:23,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1799680.0, ans=0.0 2024-08-12 19:59:54,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1799880.0, ans=0.125 2024-08-12 20:00:01,548 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2024-08-12 20:00:06,448 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-12 20:00:19,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1799980.0, ans=0.0 2024-08-12 20:00:22,245 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 20:00:24,633 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6100, loss[loss=0.1139, beats_loss=0.009887, ecapa_loss=0.0001935, whisper_loss=0.1021, over 19126.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01104, ecapa_loss=0.0001728, whisper_loss=0.09067, over 3882007.80 frames. ], batch size: 77, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:00:33,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1800080.0, ans=0.2 2024-08-12 20:00:33,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1800080.0, ans=0.0 2024-08-12 20:00:34,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1800080.0, ans=0.125 2024-08-12 20:00:52,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1800180.0, ans=0.125 2024-08-12 20:00:55,126 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.407e+01 2.685e+01 3.141e+01 4.380e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-12 20:01:22,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1800380.0, ans=0.125 2024-08-12 20:01:26,690 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.56 vs. limit=6.0 2024-08-12 20:01:31,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1800480.0, ans=0.0 2024-08-12 20:01:32,710 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 20:01:35,600 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 20:01:42,021 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6150, loss[loss=0.09832, beats_loss=0.009742, ecapa_loss=0.0002142, whisper_loss=0.08643, over 18527.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01104, ecapa_loss=0.0001725, whisper_loss=0.09072, over 3894735.76 frames. ], batch size: 75, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:01:46,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1800580.0, ans=0.0 2024-08-12 20:01:57,039 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 20:02:08,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1800680.0, ans=0.1 2024-08-12 20:02:25,016 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 20:02:25,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1800780.0, ans=0.2 2024-08-12 20:02:58,078 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6200, loss[loss=0.08667, beats_loss=0.01424, ecapa_loss=0.0001248, whisper_loss=0.07118, over 18609.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01111, ecapa_loss=0.0001718, whisper_loss=0.09063, over 3866758.86 frames. ], batch size: 73, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:03:02,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1801080.0, ans=0.1 2024-08-12 20:03:20,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1801180.0, ans=0.2 2024-08-12 20:03:27,800 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.462e+01 2.878e+01 3.273e+01 2.094e+02, threshold=5.757e+01, percent-clipped=3.0 2024-08-12 20:03:31,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1801280.0, ans=0.07 2024-08-12 20:04:12,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.52 vs. limit=10.0 2024-08-12 20:04:13,892 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6250, loss[loss=0.08979, beats_loss=0.01258, ecapa_loss=0.0001489, whisper_loss=0.07571, over 15983.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01107, ecapa_loss=0.0001726, whisper_loss=0.09092, over 3856219.32 frames. ], batch size: 61, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:04:38,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1801680.0, ans=0.1 2024-08-12 20:04:56,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1801880.0, ans=0.0 2024-08-12 20:05:04,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1801880.0, ans=0.125 2024-08-12 20:05:27,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1802080.0, ans=0.2 2024-08-12 20:05:28,344 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6300, loss[loss=0.09561, beats_loss=0.01212, ecapa_loss=0.0001548, whisper_loss=0.08194, over 22324.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01104, ecapa_loss=0.0001732, whisper_loss=0.09138, over 3853594.66 frames. ], batch size: 90, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:05:28,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1802080.0, ans=0.125 2024-08-12 20:05:31,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1802080.0, ans=0.015 2024-08-12 20:05:51,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.65 vs. limit=15.0 2024-08-12 20:05:55,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1802180.0, ans=0.1 2024-08-12 20:05:57,989 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.436e+01 2.696e+01 3.138e+01 5.310e+01, threshold=5.392e+01, percent-clipped=0.0 2024-08-12 20:06:01,757 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.21 vs. limit=10.0 2024-08-12 20:06:11,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1802280.0, ans=0.1 2024-08-12 20:06:14,781 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 20:06:22,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1802380.0, ans=10.0 2024-08-12 20:06:32,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1802480.0, ans=0.0 2024-08-12 20:06:34,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2024-08-12 20:06:43,445 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6350, loss[loss=0.1048, beats_loss=0.01111, ecapa_loss=0.000149, whisper_loss=0.09224, over 23417.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01097, ecapa_loss=0.0001741, whisper_loss=0.09229, over 3869905.35 frames. ], batch size: 93, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:06:43,638 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 20:06:52,629 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 20:06:54,209 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 24 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-12 20:07:07,733 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 20:07:09,577 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.04 vs. limit=10.0 2024-08-12 20:07:11,415 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 20:07:16,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1802780.0, ans=0.125 2024-08-12 20:07:35,273 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 20:07:42,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1802980.0, ans=0.125 2024-08-12 20:07:49,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1802980.0, ans=0.125 2024-08-12 20:07:49,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1802980.0, ans=0.2 2024-08-12 20:07:53,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1802980.0, ans=0.0 2024-08-12 20:07:57,085 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6400, loss[loss=0.1265, beats_loss=0.01075, ecapa_loss=0.0001601, whisper_loss=0.1141, over 22359.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01103, ecapa_loss=0.0001734, whisper_loss=0.09224, over 3885674.52 frames. ], batch size: 86, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:08:02,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=22.5 2024-08-12 20:08:08,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1803080.0, ans=0.0 2024-08-12 20:08:24,767 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.558e+01 2.846e+01 3.413e+01 1.173e+02, threshold=5.692e+01, percent-clipped=2.0 2024-08-12 20:08:34,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1803280.0, ans=0.0 2024-08-12 20:08:47,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1803380.0, ans=0.1 2024-08-12 20:08:49,962 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 20:08:57,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1803480.0, ans=0.2 2024-08-12 20:09:08,419 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6450, loss[loss=0.08519, beats_loss=0.01016, ecapa_loss=0.0001905, whisper_loss=0.07313, over 16881.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0111, ecapa_loss=0.0001732, whisper_loss=0.09141, over 3904387.36 frames. ], batch size: 68, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:09:12,334 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-08-12 20:09:23,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1803680.0, ans=0.125 2024-08-12 20:09:31,088 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 20:09:37,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1803780.0, ans=0.0 2024-08-12 20:09:37,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.51 vs. limit=15.0 2024-08-12 20:09:48,310 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 20:09:56,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1803880.0, ans=0.1 2024-08-12 20:09:57,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1803880.0, ans=0.2 2024-08-12 20:10:07,089 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 20:10:11,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1803980.0, ans=0.125 2024-08-12 20:10:12,676 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:10:20,050 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6500, loss[loss=0.09841, beats_loss=0.01217, ecapa_loss=0.0002224, whisper_loss=0.08402, over 19347.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.011, ecapa_loss=0.0001734, whisper_loss=0.09241, over 3927434.21 frames. ], batch size: 86, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:10:20,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1804080.0, ans=0.125 2024-08-12 20:10:45,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-12 20:10:48,799 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.417e+01 2.617e+01 2.819e+01 4.970e+01, threshold=5.233e+01, percent-clipped=0.0 2024-08-12 20:10:51,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=15.0 2024-08-12 20:10:53,357 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 20:11:02,320 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=12.0 2024-08-12 20:11:07,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1804380.0, ans=0.0 2024-08-12 20:11:09,910 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 20:11:10,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1804380.0, ans=0.0 2024-08-12 20:11:10,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1804380.0, ans=0.125 2024-08-12 20:11:17,221 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 20:11:29,723 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 20:11:30,757 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6550, loss[loss=0.1083, beats_loss=0.012, ecapa_loss=0.0001999, whisper_loss=0.09432, over 22266.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01103, ecapa_loss=0.0001732, whisper_loss=0.09275, over 3960900.79 frames. ], batch size: 90, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:11:35,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1804580.0, ans=0.1 2024-08-12 20:11:36,430 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 20:12:00,501 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.27 vs. limit=10.0 2024-08-12 20:12:23,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1804880.0, ans=0.125 2024-08-12 20:12:39,693 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6600, loss[loss=0.1193, beats_loss=0.01124, ecapa_loss=0.0001699, whisper_loss=0.1064, over 13239.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01091, ecapa_loss=0.0001746, whisper_loss=0.09351, over 3966647.77 frames. ], batch size: 54, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:12:41,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1805080.0, ans=0.2 2024-08-12 20:12:42,560 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 20:12:59,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1805180.0, ans=0.125 2024-08-12 20:13:06,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.480e+01 2.766e+01 3.110e+01 5.063e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 20:13:23,377 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 20:13:27,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1805380.0, ans=0.05 2024-08-12 20:13:31,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1805380.0, ans=0.0 2024-08-12 20:13:37,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1805480.0, ans=0.125 2024-08-12 20:13:47,722 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6650, loss[loss=0.1031, beats_loss=0.01049, ecapa_loss=0.0001497, whisper_loss=0.09115, over 15568.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.011, ecapa_loss=0.0001739, whisper_loss=0.09253, over 3966308.27 frames. ], batch size: 58, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:13:52,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1805580.0, ans=0.125 2024-08-12 20:14:10,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1805680.0, ans=0.125 2024-08-12 20:14:48,372 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 20:14:56,304 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6700, loss[loss=0.1059, beats_loss=0.01278, ecapa_loss=0.0001392, whisper_loss=0.09169, over 21197.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01103, ecapa_loss=0.0001733, whisper_loss=0.09248, over 3951029.39 frames. ], batch size: 82, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:15:00,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1806080.0, ans=0.025 2024-08-12 20:15:11,482 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 20:15:23,563 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.568e+01 2.820e+01 3.306e+01 6.884e+01, threshold=5.641e+01, percent-clipped=3.0 2024-08-12 20:15:35,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1806280.0, ans=0.125 2024-08-12 20:15:46,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1806380.0, ans=10.0 2024-08-12 20:15:53,708 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.50 vs. limit=10.0 2024-08-12 20:16:05,713 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6750, loss[loss=0.09599, beats_loss=0.008869, ecapa_loss=0.0001859, whisper_loss=0.08526, over 22182.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01096, ecapa_loss=0.0001739, whisper_loss=0.09293, over 3941148.21 frames. ], batch size: 89, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:16:11,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1806580.0, ans=0.1 2024-08-12 20:16:36,916 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 21 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 20:16:47,020 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:17:00,038 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=15.0 2024-08-12 20:17:12,979 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 20:17:15,485 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6800, loss[loss=0.1124, beats_loss=0.009809, ecapa_loss=0.0002213, whisper_loss=0.1004, over 21885.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01097, ecapa_loss=0.0001742, whisper_loss=0.09235, over 3939128.21 frames. ], batch size: 92, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:17:32,338 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 20:17:43,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.433e+01 2.678e+01 3.224e+01 5.136e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-12 20:17:44,566 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 32 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-12 20:17:48,726 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-12 20:17:55,692 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 20:18:09,203 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 20:18:23,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1807580.0, ans=0.125 2024-08-12 20:18:24,714 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6850, loss[loss=0.1103, beats_loss=0.01094, ecapa_loss=0.000163, whisper_loss=0.09769, over 21166.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01084, ecapa_loss=0.0001744, whisper_loss=0.09255, over 3891116.78 frames. ], batch size: 82, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:18:39,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1807680.0, ans=0.125 2024-08-12 20:18:41,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1807680.0, ans=0.125 2024-08-12 20:18:45,336 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-12 20:18:46,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1807680.0, ans=0.125 2024-08-12 20:18:47,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1807680.0, ans=0.0 2024-08-12 20:18:49,596 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2024-08-12 20:19:20,317 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 20:19:29,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2024-08-12 20:19:33,788 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6900, loss[loss=0.1244, beats_loss=0.00911, ecapa_loss=0.0001624, whisper_loss=0.1136, over 22375.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01085, ecapa_loss=0.0001747, whisper_loss=0.09238, over 3880768.25 frames. ], batch size: 86, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:19:34,137 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 12 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 20:19:40,697 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-08-12 20:19:45,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1808080.0, ans=0.0 2024-08-12 20:19:55,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1808180.0, ans=0.125 2024-08-12 20:20:01,725 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.057e+01 2.402e+01 2.709e+01 3.139e+01 1.091e+02, threshold=5.419e+01, percent-clipped=1.0 2024-08-12 20:20:22,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1808380.0, ans=0.125 2024-08-12 20:20:25,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1808380.0, ans=0.125 2024-08-12 20:20:31,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1808480.0, ans=0.125 2024-08-12 20:20:39,411 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.66 vs. limit=22.5 2024-08-12 20:20:41,723 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 6950, loss[loss=0.1166, beats_loss=0.009613, ecapa_loss=0.0001636, whisper_loss=0.1053, over 15732.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01084, ecapa_loss=0.0001739, whisper_loss=0.09256, over 3862822.99 frames. ], batch size: 60, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:20:46,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1808580.0, ans=0.125 2024-08-12 20:20:47,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1808580.0, ans=0.2 2024-08-12 20:20:49,554 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.14 vs. limit=10.0 2024-08-12 20:20:57,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1808680.0, ans=0.0 2024-08-12 20:21:00,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1808680.0, ans=0.1 2024-08-12 20:21:11,108 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.44 vs. limit=10.0 2024-08-12 20:21:15,186 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.03 vs. limit=10.0 2024-08-12 20:21:20,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1808780.0, ans=0.2 2024-08-12 20:21:25,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1808880.0, ans=0.07 2024-08-12 20:21:52,147 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7000, loss[loss=0.119, beats_loss=0.01077, ecapa_loss=0.000159, whisper_loss=0.1066, over 23398.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01087, ecapa_loss=0.0001729, whisper_loss=0.09265, over 3880282.45 frames. ], batch size: 91, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:21:55,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1809080.0, ans=0.1 2024-08-12 20:21:56,640 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 20:22:00,920 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.93 vs. limit=10.0 2024-08-12 20:22:03,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1809080.0, ans=0.0 2024-08-12 20:22:17,470 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 21 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 20:22:19,779 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.381e+01 2.667e+01 3.091e+01 4.298e+01, threshold=5.335e+01, percent-clipped=0.0 2024-08-12 20:22:21,979 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.61 vs. limit=15.0 2024-08-12 20:22:22,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1809280.0, ans=0.125 2024-08-12 20:22:25,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1809280.0, ans=0.1 2024-08-12 20:22:26,764 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 20:22:28,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1809280.0, ans=0.125 2024-08-12 20:22:35,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1809380.0, ans=0.0 2024-08-12 20:23:01,540 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7050, loss[loss=0.1041, beats_loss=0.009188, ecapa_loss=0.0002001, whisper_loss=0.09289, over 16640.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0109, ecapa_loss=0.0001734, whisper_loss=0.09211, over 3877030.09 frames. ], batch size: 69, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:23:12,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1809580.0, ans=0.1 2024-08-12 20:23:17,860 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 20:23:33,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1809780.0, ans=0.125 2024-08-12 20:23:47,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1809880.0, ans=0.0 2024-08-12 20:23:54,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1809880.0, ans=0.125 2024-08-12 20:24:01,884 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.66 vs. limit=22.5 2024-08-12 20:24:04,016 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 20:24:04,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1809980.0, ans=0.125 2024-08-12 20:24:04,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2024-08-12 20:24:10,821 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7100, loss[loss=0.1249, beats_loss=0.009249, ecapa_loss=0.0001629, whisper_loss=0.114, over 23746.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01097, ecapa_loss=0.0001717, whisper_loss=0.09143, over 3884252.90 frames. ], batch size: 91, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:24:15,648 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 20:24:24,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1810180.0, ans=0.2 2024-08-12 20:24:29,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1810180.0, ans=0.1 2024-08-12 20:24:31,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2024-08-12 20:24:34,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1810180.0, ans=0.1 2024-08-12 20:24:38,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.554e+01 2.752e+01 3.133e+01 4.741e+01, threshold=5.504e+01, percent-clipped=0.0 2024-08-12 20:24:48,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-08-12 20:24:51,454 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-12 20:24:59,717 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-12 20:25:03,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1810380.0, ans=0.2 2024-08-12 20:25:04,586 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 20:25:19,346 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7150, loss[loss=0.08588, beats_loss=0.01129, ecapa_loss=0.0001739, whisper_loss=0.07285, over 21781.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01099, ecapa_loss=0.0001721, whisper_loss=0.09104, over 3885713.41 frames. ], batch size: 89, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:25:34,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1810680.0, ans=0.125 2024-08-12 20:25:37,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1810680.0, ans=0.09899494936611666 2024-08-12 20:25:45,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1810780.0, ans=0.0 2024-08-12 20:25:50,946 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 11 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 20:26:09,618 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 20:26:16,842 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-12 20:26:22,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1810980.0, ans=0.0 2024-08-12 20:26:28,808 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7200, loss[loss=0.111, beats_loss=0.01206, ecapa_loss=0.0001769, whisper_loss=0.09718, over 22364.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01099, ecapa_loss=0.0001702, whisper_loss=0.0909, over 3856213.28 frames. ], batch size: 91, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:26:29,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1811080.0, ans=0.1 2024-08-12 20:26:29,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1811080.0, ans=0.125 2024-08-12 20:26:33,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1811080.0, ans=0.0 2024-08-12 20:26:34,547 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:26:43,870 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 20:26:44,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1811180.0, ans=0.125 2024-08-12 20:26:47,826 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 20:26:55,629 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.470e+01 2.758e+01 3.060e+01 4.587e+01, threshold=5.516e+01, percent-clipped=0.0 2024-08-12 20:26:55,890 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 15 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 20:27:37,038 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7250, loss[loss=0.1332, beats_loss=0.01067, ecapa_loss=0.0001601, whisper_loss=0.121, over 22501.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01099, ecapa_loss=0.0001699, whisper_loss=0.09176, over 3895492.22 frames. ], batch size: 89, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:27:38,598 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 20:27:44,390 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 20:27:58,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1811680.0, ans=0.0 2024-08-12 20:28:00,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1811680.0, ans=0.125 2024-08-12 20:28:12,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1811780.0, ans=0.125 2024-08-12 20:28:22,485 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 20:28:24,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1811880.0, ans=0.125 2024-08-12 20:28:28,063 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 20:28:33,876 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 28 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 20:28:43,309 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 20:28:46,304 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 20:28:47,393 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7300, loss[loss=0.1253, beats_loss=0.008193, ecapa_loss=0.0001461, whisper_loss=0.1156, over 20333.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01095, ecapa_loss=0.0001698, whisper_loss=0.0921, over 3909974.92 frames. ], batch size: 74, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:28:48,340 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.90 vs. limit=6.0 2024-08-12 20:28:53,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1812080.0, ans=0.0 2024-08-12 20:28:59,856 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 20:29:14,954 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.458e+01 2.787e+01 3.037e+01 3.790e+01, threshold=5.575e+01, percent-clipped=0.0 2024-08-12 20:29:30,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1812380.0, ans=0.0 2024-08-12 20:29:32,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1812380.0, ans=0.2 2024-08-12 20:29:34,382 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 22 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 20:29:43,448 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2024-08-12 20:29:44,146 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-12 20:29:53,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1812480.0, ans=0.1 2024-08-12 20:29:56,519 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7350, loss[loss=0.1076, beats_loss=0.00718, ecapa_loss=0.0001515, whisper_loss=0.09892, over 15854.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01089, ecapa_loss=0.0001709, whisper_loss=0.09217, over 3904767.50 frames. ], batch size: 57, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:29:59,604 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 20:30:06,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1812580.0, ans=0.125 2024-08-12 20:30:08,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1812680.0, ans=0.125 2024-08-12 20:30:13,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1812680.0, ans=0.125 2024-08-12 20:30:13,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1812680.0, ans=0.0 2024-08-12 20:30:20,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1812680.0, ans=0.0 2024-08-12 20:30:23,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1812780.0, ans=0.125 2024-08-12 20:30:23,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1812780.0, ans=0.09899494936611666 2024-08-12 20:30:25,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1812780.0, ans=0.2 2024-08-12 20:30:32,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1812780.0, ans=0.0 2024-08-12 20:31:04,888 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7400, loss[loss=0.1144, beats_loss=0.01109, ecapa_loss=0.0001591, whisper_loss=0.1017, over 21903.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01089, ecapa_loss=0.0001712, whisper_loss=0.0922, over 3924462.21 frames. ], batch size: 84, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:31:09,374 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 20:31:15,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1813080.0, ans=0.0 2024-08-12 20:31:20,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1813180.0, ans=0.125 2024-08-12 20:31:32,379 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.493e+01 2.726e+01 3.079e+01 4.243e+01, threshold=5.453e+01, percent-clipped=0.0 2024-08-12 20:31:33,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1813280.0, ans=0.2 2024-08-12 20:31:42,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1813280.0, ans=0.125 2024-08-12 20:31:43,601 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 27 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 20:31:45,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1813380.0, ans=0.0 2024-08-12 20:32:13,717 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7450, loss[loss=0.05922, beats_loss=0.01388, ecapa_loss=0.0001862, whisper_loss=0.04348, over 15788.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01091, ecapa_loss=0.0001712, whisper_loss=0.09253, over 3920787.17 frames. ], batch size: 70, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:32:23,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1813580.0, ans=0.2 2024-08-12 20:32:35,840 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 20:32:41,021 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 20:32:44,552 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.81 vs. limit=22.5 2024-08-12 20:32:49,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1813780.0, ans=0.1 2024-08-12 20:33:03,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.07 vs. limit=22.5 2024-08-12 20:33:13,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1813980.0, ans=0.05 2024-08-12 20:33:14,676 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 20:33:16,824 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-12 20:33:21,747 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7500, loss[loss=0.1016, beats_loss=0.01178, ecapa_loss=0.0001511, whisper_loss=0.08829, over 21352.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0109, ecapa_loss=0.0001715, whisper_loss=0.09218, over 3930236.85 frames. ], batch size: 86, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:33:26,219 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 20:33:36,043 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-12 20:33:44,107 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 20:33:49,343 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.399e+01 2.676e+01 3.018e+01 5.657e+01, threshold=5.351e+01, percent-clipped=1.0 2024-08-12 20:33:50,383 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.95 vs. limit=15.0 2024-08-12 20:33:58,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1814280.0, ans=0.125 2024-08-12 20:33:59,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1814280.0, ans=0.125 2024-08-12 20:34:01,808 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 20:34:06,875 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2024-08-12 20:34:31,175 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7550, loss[loss=0.1055, beats_loss=0.01229, ecapa_loss=0.0001298, whisper_loss=0.09191, over 16670.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01091, ecapa_loss=0.0001721, whisper_loss=0.09159, over 3894893.58 frames. ], batch size: 61, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:34:44,898 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-12 20:35:24,345 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 20:35:38,759 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.04 vs. limit=10.0 2024-08-12 20:35:40,649 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7600, loss[loss=0.1091, beats_loss=0.01042, ecapa_loss=0.0001679, whisper_loss=0.09697, over 17617.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01087, ecapa_loss=0.0001738, whisper_loss=0.0922, over 3901261.92 frames. ], batch size: 69, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:35:43,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1815080.0, ans=0.125 2024-08-12 20:36:07,667 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 20:36:08,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.568e+01 2.871e+01 3.338e+01 1.735e+02, threshold=5.742e+01, percent-clipped=2.0 2024-08-12 20:36:17,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1815280.0, ans=0.07 2024-08-12 20:36:17,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1815280.0, ans=0.0 2024-08-12 20:36:20,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1815280.0, ans=0.0 2024-08-12 20:36:26,658 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 15 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 20:36:28,275 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-12 20:36:42,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1815480.0, ans=0.5 2024-08-12 20:36:45,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1815480.0, ans=0.125 2024-08-12 20:36:47,188 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.70 vs. limit=10.0 2024-08-12 20:36:50,543 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7650, loss[loss=0.09871, beats_loss=0.009855, ecapa_loss=0.000183, whisper_loss=0.08703, over 17406.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01085, ecapa_loss=0.0001735, whisper_loss=0.09184, over 3870274.31 frames. ], batch size: 70, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:37:03,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1815680.0, ans=0.1 2024-08-12 20:37:04,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1815680.0, ans=0.125 2024-08-12 20:37:20,339 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.078e+01 2024-08-12 20:37:43,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1815880.0, ans=0.125 2024-08-12 20:37:57,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1815980.0, ans=0.125 2024-08-12 20:37:59,927 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7700, loss[loss=0.1055, beats_loss=0.008416, ecapa_loss=0.0002134, whisper_loss=0.09492, over 15023.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0109, ecapa_loss=0.0001734, whisper_loss=0.09158, over 3875957.87 frames. ], batch size: 60, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:38:03,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1816080.0, ans=0.0 2024-08-12 20:38:22,370 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-12 20:38:23,002 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2024-08-12 20:38:27,543 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.538e+01 2.763e+01 3.264e+01 5.327e+01, threshold=5.526e+01, percent-clipped=0.0 2024-08-12 20:38:39,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1816280.0, ans=0.125 2024-08-12 20:38:54,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1816480.0, ans=0.125 2024-08-12 20:38:59,906 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-12 20:39:01,210 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 20:39:08,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1816580.0, ans=0.0 2024-08-12 20:39:08,984 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7750, loss[loss=0.1027, beats_loss=0.01037, ecapa_loss=0.0001815, whisper_loss=0.09054, over 16971.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01094, ecapa_loss=0.0001725, whisper_loss=0.09115, over 3890611.34 frames. ], batch size: 68, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:39:16,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1816580.0, ans=0.125 2024-08-12 20:39:16,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1816580.0, ans=0.1 2024-08-12 20:39:26,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1816680.0, ans=0.125 2024-08-12 20:39:30,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1816680.0, ans=0.025 2024-08-12 20:39:36,554 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=12.0 2024-08-12 20:39:45,397 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 34 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-12 20:39:49,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1816880.0, ans=0.0 2024-08-12 20:39:59,429 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 20:40:01,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1816880.0, ans=0.04949747468305833 2024-08-12 20:40:01,401 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2024-08-12 20:40:14,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1816980.0, ans=0.0 2024-08-12 20:40:18,084 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7800, loss[loss=0.08932, beats_loss=0.0119, ecapa_loss=0.0001853, whisper_loss=0.07557, over 21375.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01096, ecapa_loss=0.0001722, whisper_loss=0.09097, over 3879335.40 frames. ], batch size: 94, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:40:37,162 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 20:40:44,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1817280.0, ans=0.125 2024-08-12 20:40:45,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.560e+01 2.836e+01 3.091e+01 4.411e+01, threshold=5.671e+01, percent-clipped=0.0 2024-08-12 20:40:46,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.09 vs. limit=12.0 2024-08-12 20:40:55,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1817280.0, ans=0.125 2024-08-12 20:41:01,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1817380.0, ans=0.2 2024-08-12 20:41:04,061 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:41:11,912 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 20:41:12,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1817480.0, ans=0.2 2024-08-12 20:41:16,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1817480.0, ans=0.2 2024-08-12 20:41:16,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1817480.0, ans=0.1 2024-08-12 20:41:17,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1817480.0, ans=0.1 2024-08-12 20:41:18,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1817480.0, ans=0.09899494936611666 2024-08-12 20:41:19,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1817480.0, ans=0.04949747468305833 2024-08-12 20:41:27,358 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7850, loss[loss=0.1334, beats_loss=0.008299, ecapa_loss=0.0002108, whisper_loss=0.123, over 15131.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01101, ecapa_loss=0.0001708, whisper_loss=0.09084, over 3893283.15 frames. ], batch size: 60, lr: 4.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:41:32,387 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.33 vs. limit=15.0 2024-08-12 20:41:34,349 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 20:41:37,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1817580.0, ans=0.1 2024-08-12 20:41:56,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1817780.0, ans=0.0 2024-08-12 20:42:13,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1817880.0, ans=0.125 2024-08-12 20:42:15,808 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-12 20:42:17,084 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 20:42:31,694 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2024-08-12 20:42:34,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1817980.0, ans=0.0 2024-08-12 20:42:36,554 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7900, loss[loss=0.1054, beats_loss=0.01216, ecapa_loss=0.0001327, whisper_loss=0.09195, over 24229.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001704, whisper_loss=0.09155, over 3880443.32 frames. ], batch size: 93, lr: 4.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:42:59,588 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.714e+01 2024-08-12 20:43:02,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1818180.0, ans=0.0 2024-08-12 20:43:04,172 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.497e+01 2.722e+01 3.152e+01 4.641e+01, threshold=5.444e+01, percent-clipped=0.0 2024-08-12 20:43:13,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1818280.0, ans=0.125 2024-08-12 20:43:23,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1818380.0, ans=0.125 2024-08-12 20:43:30,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1818480.0, ans=0.2 2024-08-12 20:43:45,624 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 7950, loss[loss=0.09889, beats_loss=0.01244, ecapa_loss=0.0001555, whisper_loss=0.0849, over 14998.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.011, ecapa_loss=0.0001706, whisper_loss=0.09138, over 3872153.81 frames. ], batch size: 62, lr: 4.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:43:47,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=12.0 2024-08-12 20:43:50,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1818580.0, ans=0.0 2024-08-12 20:43:57,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1818580.0, ans=0.125 2024-08-12 20:44:04,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-12 20:44:06,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1818680.0, ans=0.0 2024-08-12 20:44:09,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1818680.0, ans=0.0 2024-08-12 20:44:42,705 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 20:44:45,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1818980.0, ans=0.125 2024-08-12 20:44:55,001 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8000, loss[loss=0.1024, beats_loss=0.0117, ecapa_loss=0.0001666, whisper_loss=0.08907, over 13562.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01098, ecapa_loss=0.0001699, whisper_loss=0.09175, over 3861624.09 frames. ], batch size: 53, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:44:55,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1819080.0, ans=0.0 2024-08-12 20:45:04,527 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 20:45:04,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1819080.0, ans=0.0 2024-08-12 20:45:04,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1819080.0, ans=0.125 2024-08-12 20:45:10,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1819180.0, ans=0.125 2024-08-12 20:45:20,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1819180.0, ans=0.125 2024-08-12 20:45:22,511 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.456e+01 2.721e+01 3.092e+01 4.967e+01, threshold=5.442e+01, percent-clipped=0.0 2024-08-12 20:45:27,442 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.31 vs. limit=6.0 2024-08-12 20:45:35,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1819380.0, ans=0.0 2024-08-12 20:45:37,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1819380.0, ans=0.125 2024-08-12 20:45:39,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1819380.0, ans=0.07 2024-08-12 20:46:02,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-12 20:46:04,242 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8050, loss[loss=0.1201, beats_loss=0.01028, ecapa_loss=0.0001571, whisper_loss=0.1083, over 22931.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01094, ecapa_loss=0.0001689, whisper_loss=0.0921, over 3839784.77 frames. ], batch size: 92, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:46:10,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1819580.0, ans=15.0 2024-08-12 20:46:29,793 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-12 20:46:36,301 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 20:46:36,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1819780.0, ans=0.2 2024-08-12 20:46:48,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1819880.0, ans=0.0 2024-08-12 20:46:50,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1819880.0, ans=0.125 2024-08-12 20:46:52,608 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 20:46:58,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1819980.0, ans=0.1 2024-08-12 20:46:59,184 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 20:47:08,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1819980.0, ans=0.09899494936611666 2024-08-12 20:47:10,367 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2024-08-12 20:47:13,416 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8100, loss[loss=0.1098, beats_loss=0.01201, ecapa_loss=0.0001405, whisper_loss=0.09635, over 22933.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01098, ecapa_loss=0.0001689, whisper_loss=0.09196, over 3850001.01 frames. ], batch size: 92, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:47:19,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1820080.0, ans=0.0 2024-08-12 20:47:20,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1820080.0, ans=0.125 2024-08-12 20:47:23,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1820080.0, ans=0.0 2024-08-12 20:47:30,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1820180.0, ans=0.125 2024-08-12 20:47:35,639 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2024-08-12 20:47:37,711 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 20:47:39,223 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 17 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-12 20:47:40,271 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.501e+01 2.882e+01 3.230e+01 4.763e+01, threshold=5.764e+01, percent-clipped=0.0 2024-08-12 20:47:44,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1820280.0, ans=0.125 2024-08-12 20:48:16,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1820480.0, ans=0.02 2024-08-12 20:48:22,277 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8150, loss[loss=0.1078, beats_loss=0.009018, ecapa_loss=0.0002484, whisper_loss=0.09634, over 15877.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01093, ecapa_loss=0.0001696, whisper_loss=0.09202, over 3856648.75 frames. ], batch size: 67, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:48:22,549 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-12 20:48:24,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1820580.0, ans=0.125 2024-08-12 20:48:43,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1820680.0, ans=0.015 2024-08-12 20:48:48,104 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.16 vs. limit=15.0 2024-08-12 20:48:59,621 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 20:49:03,789 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-12 20:49:20,673 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:49:21,716 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 20:49:31,656 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8200, loss[loss=0.1079, beats_loss=0.01276, ecapa_loss=0.0001274, whisper_loss=0.09386, over 23274.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01086, ecapa_loss=0.0001698, whisper_loss=0.09237, over 3891493.92 frames. ], batch size: 90, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:49:32,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1821080.0, ans=0.0 2024-08-12 20:49:36,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1821080.0, ans=0.125 2024-08-12 20:49:45,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1821180.0, ans=0.0 2024-08-12 20:49:51,218 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 20:49:59,437 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.516e+01 2.770e+01 3.136e+01 5.305e+01, threshold=5.540e+01, percent-clipped=0.0 2024-08-12 20:50:00,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1821280.0, ans=0.125 2024-08-12 20:50:06,326 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 20:50:18,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1821380.0, ans=0.0 2024-08-12 20:50:26,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1821480.0, ans=0.125 2024-08-12 20:50:35,508 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2024-08-12 20:50:38,165 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 20:50:40,574 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8250, loss[loss=0.09973, beats_loss=0.01046, ecapa_loss=0.0001804, whisper_loss=0.08747, over 21998.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01079, ecapa_loss=0.0001713, whisper_loss=0.09288, over 3887260.08 frames. ], batch size: 92, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:50:51,897 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 20:50:53,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1821680.0, ans=0.0 2024-08-12 20:51:04,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1821680.0, ans=0.2 2024-08-12 20:51:15,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1821780.0, ans=0.125 2024-08-12 20:51:16,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1821780.0, ans=0.125 2024-08-12 20:51:28,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1821880.0, ans=0.0 2024-08-12 20:51:35,504 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 20:51:43,750 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 20:51:48,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1821980.0, ans=0.1 2024-08-12 20:51:49,776 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2024-08-12 20:51:50,231 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8300, loss[loss=0.1205, beats_loss=0.01032, ecapa_loss=0.0001704, whisper_loss=0.1085, over 23182.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0109, ecapa_loss=0.0001701, whisper_loss=0.09254, over 3911655.71 frames. ], batch size: 91, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:51:56,187 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 20:51:58,088 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2024-08-12 20:52:01,591 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 20:52:17,174 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-12 20:52:17,635 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.463e+01 2.692e+01 3.120e+01 9.968e+01, threshold=5.383e+01, percent-clipped=3.0 2024-08-12 20:52:22,129 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 20:52:22,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1822280.0, ans=0.125 2024-08-12 20:52:43,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1822480.0, ans=0.125 2024-08-12 20:52:47,951 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2024-08-12 20:52:58,061 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8350, loss[loss=0.1198, beats_loss=0.0101, ecapa_loss=0.0001793, whisper_loss=0.108, over 20221.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01093, ecapa_loss=0.0001704, whisper_loss=0.0923, over 3920317.20 frames. ], batch size: 82, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:53:03,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1822580.0, ans=0.1 2024-08-12 20:53:11,529 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 20:53:33,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1822780.0, ans=0.125 2024-08-12 20:53:39,261 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 20:53:48,138 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 20:53:52,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1822980.0, ans=0.125 2024-08-12 20:53:53,657 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 20:53:53,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1822980.0, ans=0.125 2024-08-12 20:53:59,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1822980.0, ans=0.2 2024-08-12 20:54:07,922 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8400, loss[loss=0.1059, beats_loss=0.01181, ecapa_loss=0.0001818, whisper_loss=0.09223, over 14341.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01093, ecapa_loss=0.000171, whisper_loss=0.09224, over 3885421.45 frames. ], batch size: 56, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:54:16,188 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 20:54:19,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1823080.0, ans=0.0 2024-08-12 20:54:32,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1823180.0, ans=0.2 2024-08-12 20:54:35,956 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.526e+01 2.875e+01 3.220e+01 4.758e+01, threshold=5.750e+01, percent-clipped=0.0 2024-08-12 20:54:42,091 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 15 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 20:54:48,403 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-12 20:54:49,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1823380.0, ans=0.1 2024-08-12 20:54:51,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1823380.0, ans=0.125 2024-08-12 20:54:52,205 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 20:54:57,643 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 22 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-12 20:55:02,036 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 20:55:13,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1823480.0, ans=0.125 2024-08-12 20:55:18,863 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8450, loss[loss=0.121, beats_loss=0.009504, ecapa_loss=0.0002025, whisper_loss=0.1094, over 22205.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01082, ecapa_loss=0.0001728, whisper_loss=0.09244, over 3879802.34 frames. ], batch size: 92, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:55:36,588 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 28 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 20:55:40,756 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 34 from LS+wenet, 9 from Vox, 34 fro AS 2024-08-12 20:55:41,497 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2024-08-12 20:55:45,408 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2024-08-12 20:55:47,346 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 20:55:49,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1823780.0, ans=0.125 2024-08-12 20:55:56,027 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 20:56:07,178 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-12 20:56:13,357 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.94 vs. limit=10.0 2024-08-12 20:56:16,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1823980.0, ans=6.0 2024-08-12 20:56:27,165 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 20:56:30,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1824080.0, ans=0.125 2024-08-12 20:56:31,147 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8500, loss[loss=0.1286, beats_loss=0.006267, ecapa_loss=0.0002013, whisper_loss=0.1203, over 23320.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01077, ecapa_loss=0.0001718, whisper_loss=0.09238, over 3838510.06 frames. ], batch size: 89, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:56:42,232 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 20:56:53,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1824180.0, ans=0.125 2024-08-12 20:56:54,181 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 20:56:54,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1824180.0, ans=0.0 2024-08-12 20:57:01,695 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.569e+01 2.792e+01 3.196e+01 4.300e+01, threshold=5.585e+01, percent-clipped=0.0 2024-08-12 20:57:26,514 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 20:57:27,840 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-12 20:57:31,227 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2024-08-12 20:57:32,236 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 20:57:32,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1824480.0, ans=0.2 2024-08-12 20:57:33,975 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.082e+00 2024-08-12 20:57:46,080 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8550, loss[loss=0.1035, beats_loss=0.01145, ecapa_loss=0.0001804, whisper_loss=0.09024, over 22719.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001703, whisper_loss=0.09176, over 3857363.03 frames. ], batch size: 90, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:57:46,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1824580.0, ans=0.125 2024-08-12 20:57:57,173 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.69 vs. limit=22.5 2024-08-12 20:58:06,465 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 20:58:16,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1824780.0, ans=0.1 2024-08-12 20:58:18,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1824780.0, ans=0.125 2024-08-12 20:58:18,400 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2024-08-12 20:58:56,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1824980.0, ans=0.125 2024-08-12 20:58:58,962 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8600, loss[loss=0.1192, beats_loss=0.007999, ecapa_loss=0.0002063, whisper_loss=0.1091, over 19795.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01086, ecapa_loss=0.0001703, whisper_loss=0.09209, over 3853062.73 frames. ], batch size: 81, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:59:00,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1825080.0, ans=0.0 2024-08-12 20:59:08,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1825080.0, ans=0.125 2024-08-12 20:59:24,228 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 20:59:31,413 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.497e+01 2.777e+01 3.095e+01 5.281e+01, threshold=5.554e+01, percent-clipped=0.0 2024-08-12 20:59:45,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1825380.0, ans=0.125 2024-08-12 20:59:49,049 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 20:59:49,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1825380.0, ans=0.125 2024-08-12 20:59:53,541 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 21:00:17,626 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8650, loss[loss=0.1154, beats_loss=0.008898, ecapa_loss=0.0001886, whisper_loss=0.1046, over 18176.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01083, ecapa_loss=0.0001721, whisper_loss=0.09159, over 3814184.03 frames. ], batch size: 71, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:00:41,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1825680.0, ans=0.125 2024-08-12 21:01:01,319 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 21:01:03,071 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.074e+01 2024-08-12 21:01:04,061 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 21:01:11,334 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2024-08-12 21:01:12,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1825880.0, ans=0.125 2024-08-12 21:01:24,118 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 21:01:27,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1825980.0, ans=0.2 2024-08-12 21:01:28,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1825980.0, ans=10.0 2024-08-12 21:01:30,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1825980.0, ans=0.125 2024-08-12 21:01:33,195 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8700, loss[loss=0.1224, beats_loss=0.01103, ecapa_loss=0.0002449, whisper_loss=0.1089, over 14040.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01089, ecapa_loss=0.0001712, whisper_loss=0.09145, over 3814385.04 frames. ], batch size: 55, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:01:43,026 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.19 vs. limit=22.5 2024-08-12 21:01:43,136 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.95 vs. limit=15.0 2024-08-12 21:01:44,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1826080.0, ans=0.09899494936611666 2024-08-12 21:01:44,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1826080.0, ans=0.1 2024-08-12 21:01:59,962 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 21:02:04,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.617e+01 2.806e+01 3.109e+01 1.024e+02, threshold=5.612e+01, percent-clipped=1.0 2024-08-12 21:02:07,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1826280.0, ans=0.2 2024-08-12 21:02:09,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1826280.0, ans=0.125 2024-08-12 21:02:19,023 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 21:02:25,894 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 21:02:31,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1826380.0, ans=0.1 2024-08-12 21:02:36,413 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 21:02:50,039 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8750, loss[loss=0.09769, beats_loss=0.01101, ecapa_loss=0.0001461, whisper_loss=0.08522, over 14801.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.0001714, whisper_loss=0.09113, over 3806749.12 frames. ], batch size: 56, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:03:17,049 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 38 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 21:03:17,578 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2024-08-12 21:03:33,135 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2024-08-12 21:03:40,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1826880.0, ans=0.125 2024-08-12 21:03:50,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1826980.0, ans=0.125 2024-08-12 21:04:01,391 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-08-12 21:04:03,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1826980.0, ans=0.125 2024-08-12 21:04:08,033 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8800, loss[loss=0.08173, beats_loss=0.01515, ecapa_loss=0.0001426, whisper_loss=0.06515, over 21448.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01096, ecapa_loss=0.0001713, whisper_loss=0.09099, over 3854341.69 frames. ], batch size: 90, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:04:12,824 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 21:04:32,639 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2024-08-12 21:04:37,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1827180.0, ans=0.125 2024-08-12 21:04:39,358 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.528e+01 2.804e+01 3.159e+01 1.036e+02, threshold=5.609e+01, percent-clipped=2.0 2024-08-12 21:04:40,261 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 21:04:43,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1827280.0, ans=0.125 2024-08-12 21:04:46,853 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.97 vs. limit=22.5 2024-08-12 21:04:59,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1827380.0, ans=0.125 2024-08-12 21:05:21,582 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.61 vs. limit=22.5 2024-08-12 21:05:22,957 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-12 21:05:25,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1827580.0, ans=0.125 2024-08-12 21:05:26,723 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8850, loss[loss=0.08621, beats_loss=0.01423, ecapa_loss=0.0001496, whisper_loss=0.07048, over 18784.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01096, ecapa_loss=0.0001713, whisper_loss=0.0914, over 3854525.29 frames. ], batch size: 77, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:05:27,397 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2024-08-12 21:05:44,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1827680.0, ans=0.125 2024-08-12 21:05:50,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1827680.0, ans=0.125 2024-08-12 21:05:53,639 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-12 21:05:58,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1827780.0, ans=0.125 2024-08-12 21:05:59,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1827780.0, ans=0.125 2024-08-12 21:06:02,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1827780.0, ans=0.05 2024-08-12 21:06:02,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1827780.0, ans=0.0 2024-08-12 21:06:07,914 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 21:06:27,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1827980.0, ans=0.125 2024-08-12 21:06:39,240 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=12.0 2024-08-12 21:06:42,718 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8900, loss[loss=0.1046, beats_loss=0.01013, ecapa_loss=0.0001669, whisper_loss=0.09276, over 22985.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01093, ecapa_loss=0.0001719, whisper_loss=0.0919, over 3847963.49 frames. ], batch size: 85, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:06:55,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1828080.0, ans=0.0 2024-08-12 21:07:12,446 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 21:07:15,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.512e+01 2.855e+01 3.103e+01 6.109e+01, threshold=5.710e+01, percent-clipped=1.0 2024-08-12 21:07:21,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1828280.0, ans=0.2 2024-08-12 21:07:21,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1828280.0, ans=0.125 2024-08-12 21:07:21,508 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2024-08-12 21:07:29,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1828380.0, ans=0.125 2024-08-12 21:07:41,607 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 21:07:50,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1828480.0, ans=0.125 2024-08-12 21:07:53,611 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2024-08-12 21:07:56,809 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2024-08-12 21:07:59,907 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 8950, loss[loss=0.1315, beats_loss=0.008441, ecapa_loss=0.0001578, whisper_loss=0.1215, over 23230.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01101, ecapa_loss=0.0001699, whisper_loss=0.09104, over 3818365.99 frames. ], batch size: 89, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:08:10,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1828580.0, ans=0.0 2024-08-12 21:08:21,927 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.297e+00 2024-08-12 21:08:46,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1828880.0, ans=0.125 2024-08-12 21:08:48,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1828880.0, ans=0.09899494936611666 2024-08-12 21:08:52,270 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 21:08:52,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1828880.0, ans=0.0 2024-08-12 21:08:57,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1828880.0, ans=0.0 2024-08-12 21:09:02,190 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2024-08-12 21:09:16,132 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9000, loss[loss=0.09519, beats_loss=0.01062, ecapa_loss=0.0001833, whisper_loss=0.08274, over 21786.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01105, ecapa_loss=0.0001697, whisper_loss=0.09081, over 3839715.39 frames. ], batch size: 92, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:09:16,133 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 21:09:54,938 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005776, whisper_loss=0.2483, over 922467.00 frames. 2024-08-12 21:10:13,675 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on SV_voxceleb1: loss=0.004711, beats_loss=0, ecapa_loss=0.0004711, whisper_loss=0, over 939242.00 frames. 2024-08-12 21:10:55,400 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.6799, 2.3464, 2.3705, 1.9332, 3.1312, 2.3207, 2.4523, 2.2462], device='cuda:1') 2024-08-12 21:12:02,748 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on AT_audioset: loss=0.02411, beats_loss=0.02411, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 21:12:02,752 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 21:12:12,279 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 21:12:25,186 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.48 vs. limit=6.0 2024-08-12 21:12:37,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.414e+01 2.685e+01 3.059e+01 6.063e+01, threshold=5.370e+01, percent-clipped=1.0 2024-08-12 21:12:46,102 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.66 vs. limit=22.5 2024-08-12 21:12:54,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1829380.0, ans=0.04949747468305833 2024-08-12 21:13:16,744 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 21:13:22,643 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9050, loss[loss=0.1245, beats_loss=0.009143, ecapa_loss=0.0001833, whisper_loss=0.1135, over 23033.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.000171, whisper_loss=0.09148, over 3866342.73 frames. ], batch size: 92, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:13:24,389 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 12 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 21:13:29,214 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-12 21:13:36,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1829680.0, ans=0.125 2024-08-12 21:13:42,281 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 21:13:46,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1829680.0, ans=0.2 2024-08-12 21:13:52,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1829780.0, ans=0.125 2024-08-12 21:14:02,644 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 21:14:03,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1829780.0, ans=0.2 2024-08-12 21:14:07,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1829880.0, ans=0.125 2024-08-12 21:14:11,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1829880.0, ans=0.95 2024-08-12 21:14:33,139 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 21:14:33,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1829980.0, ans=0.1 2024-08-12 21:14:38,565 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9100, loss[loss=0.1191, beats_loss=0.01034, ecapa_loss=0.0001735, whisper_loss=0.107, over 23147.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01088, ecapa_loss=0.0001713, whisper_loss=0.09135, over 3851491.52 frames. ], batch size: 94, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:14:43,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1830080.0, ans=0.125 2024-08-12 21:14:45,024 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 19 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 21:14:45,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1830080.0, ans=0.0 2024-08-12 21:14:49,294 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 19 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 21:14:49,856 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2024-08-12 21:14:59,980 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 21:15:03,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1830180.0, ans=0.125 2024-08-12 21:15:11,888 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.477e+01 2.788e+01 3.055e+01 6.197e+01, threshold=5.576e+01, percent-clipped=1.0 2024-08-12 21:15:12,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1830280.0, ans=0.125 2024-08-12 21:15:13,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1830280.0, ans=0.0 2024-08-12 21:15:25,760 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 21:15:27,565 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-12 21:15:42,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1830480.0, ans=0.025 2024-08-12 21:15:47,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1830480.0, ans=0.07 2024-08-12 21:15:52,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1830480.0, ans=0.0 2024-08-12 21:15:56,210 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9150, loss[loss=0.1116, beats_loss=0.01138, ecapa_loss=0.0002003, whisper_loss=0.09822, over 22605.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01091, ecapa_loss=0.0001718, whisper_loss=0.09134, over 3835341.52 frames. ], batch size: 94, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:15:58,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1830580.0, ans=0.125 2024-08-12 21:16:10,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1830680.0, ans=0.125 2024-08-12 21:16:17,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1830680.0, ans=0.125 2024-08-12 21:16:28,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1830780.0, ans=0.125 2024-08-12 21:16:36,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1830780.0, ans=0.125 2024-08-12 21:17:10,381 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9200, loss[loss=0.08995, beats_loss=0.01281, ecapa_loss=0.0001876, whisper_loss=0.07527, over 21638.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01095, ecapa_loss=0.0001729, whisper_loss=0.09061, over 3864980.41 frames. ], batch size: 91, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:17:15,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1831080.0, ans=0.125 2024-08-12 21:17:24,433 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 21:17:37,888 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 21:17:42,027 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.481e+01 2.738e+01 3.160e+01 4.519e+01, threshold=5.476e+01, percent-clipped=0.0 2024-08-12 21:17:53,343 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 21:18:17,988 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 21:18:26,421 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9250, loss[loss=0.09541, beats_loss=0.01185, ecapa_loss=0.0001577, whisper_loss=0.08198, over 18427.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01098, ecapa_loss=0.0001727, whisper_loss=0.09074, over 3889129.79 frames. ], batch size: 74, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:18:31,624 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 21:18:36,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1831580.0, ans=0.125 2024-08-12 21:19:03,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1831780.0, ans=0.125 2024-08-12 21:19:08,549 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.35 vs. limit=22.5 2024-08-12 21:19:41,607 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9300, loss[loss=0.1102, beats_loss=0.0112, ecapa_loss=0.0001556, whisper_loss=0.09742, over 18340.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.000173, whisper_loss=0.09138, over 3901068.04 frames. ], batch size: 72, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:20:11,872 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.613e+01 2.997e+01 3.337e+01 4.853e+01, threshold=5.993e+01, percent-clipped=0.0 2024-08-12 21:20:28,436 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=12.0 2024-08-12 21:20:35,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1832380.0, ans=0.125 2024-08-12 21:20:40,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1832480.0, ans=0.0 2024-08-12 21:20:54,551 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9350, loss[loss=0.1282, beats_loss=0.009975, ecapa_loss=0.0001698, whisper_loss=0.1165, over 23053.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.0001723, whisper_loss=0.09144, over 3873644.36 frames. ], batch size: 90, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:21:10,496 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-12 21:21:28,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1832780.0, ans=0.0 2024-08-12 21:21:29,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1832780.0, ans=0.125 2024-08-12 21:21:44,831 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 17 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-12 21:21:48,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1832880.0, ans=0.0 2024-08-12 21:21:56,282 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-12 21:22:05,651 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=15.0 2024-08-12 21:22:08,361 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9400, loss[loss=0.1031, beats_loss=0.01111, ecapa_loss=0.0001606, whisper_loss=0.09039, over 20001.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01091, ecapa_loss=0.000174, whisper_loss=0.09123, over 3867767.30 frames. ], batch size: 82, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:22:14,204 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-12 21:22:26,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1833180.0, ans=0.05 2024-08-12 21:22:40,574 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.361e+01 2.679e+01 2.977e+01 4.432e+01, threshold=5.358e+01, percent-clipped=0.0 2024-08-12 21:22:42,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1833280.0, ans=0.125 2024-08-12 21:22:46,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1833280.0, ans=0.2 2024-08-12 21:22:56,686 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.05 vs. limit=22.5 2024-08-12 21:23:15,675 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 21:23:24,568 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9450, loss[loss=0.1151, beats_loss=0.01141, ecapa_loss=0.0001359, whisper_loss=0.1024, over 23248.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01087, ecapa_loss=0.0001744, whisper_loss=0.09151, over 3866787.85 frames. ], batch size: 90, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:23:42,880 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 17 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 21:23:45,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1833680.0, ans=0.125 2024-08-12 21:23:58,365 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 21:24:35,373 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 21:24:39,162 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9500, loss[loss=0.1141, beats_loss=0.009643, ecapa_loss=0.0001414, whisper_loss=0.103, over 18510.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01097, ecapa_loss=0.0001734, whisper_loss=0.0907, over 3859469.39 frames. ], batch size: 69, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:25:09,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.426e+01 2.699e+01 3.219e+01 5.763e+01, threshold=5.398e+01, percent-clipped=1.0 2024-08-12 21:25:22,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1834380.0, ans=0.125 2024-08-12 21:25:26,390 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-12 21:25:50,203 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9550, loss[loss=0.1132, beats_loss=0.01069, ecapa_loss=0.0001585, whisper_loss=0.1009, over 20235.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01093, ecapa_loss=0.0001722, whisper_loss=0.09085, over 3851386.46 frames. ], batch size: 78, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:25:59,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1834580.0, ans=0.125 2024-08-12 21:26:11,944 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 21:26:30,720 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2024-08-12 21:26:38,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=1834880.0, ans=6.0 2024-08-12 21:26:46,715 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-12 21:27:01,635 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9600, loss[loss=0.08422, beats_loss=0.01424, ecapa_loss=0.0001464, whisper_loss=0.06852, over 17540.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01091, ecapa_loss=0.0001729, whisper_loss=0.09052, over 3828193.49 frames. ], batch size: 71, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:27:17,348 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.829e+00 2024-08-12 21:27:22,384 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 21:27:30,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.578e+01 2.916e+01 3.452e+01 6.223e+01, threshold=5.833e+01, percent-clipped=1.0 2024-08-12 21:27:40,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1835280.0, ans=0.125 2024-08-12 21:27:42,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1835380.0, ans=0.0 2024-08-12 21:28:03,680 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 21:28:10,165 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9650, loss[loss=0.1226, beats_loss=0.008806, ecapa_loss=0.0001829, whisper_loss=0.112, over 20002.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01089, ecapa_loss=0.0001725, whisper_loss=0.09114, over 3817563.97 frames. ], batch size: 82, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:28:10,332 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 21:28:14,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1835580.0, ans=0.125 2024-08-12 21:28:16,578 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-12 21:28:31,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1835680.0, ans=0.0 2024-08-12 21:28:36,619 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 21:28:37,261 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=12.0 2024-08-12 21:28:50,800 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 24 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-12 21:28:58,813 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 21:28:59,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-12 21:29:04,735 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 20 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 21:29:12,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1835980.0, ans=0.1 2024-08-12 21:29:16,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1835980.0, ans=0.1 2024-08-12 21:29:19,766 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9700, loss[loss=0.08361, beats_loss=0.009948, ecapa_loss=0.0001899, whisper_loss=0.07176, over 15747.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0109, ecapa_loss=0.0001718, whisper_loss=0.09108, over 3817839.35 frames. ], batch size: 65, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:29:37,561 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 21:29:38,770 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 21:29:46,470 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 21:29:48,837 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.429e+01 2.686e+01 3.028e+01 5.758e+01, threshold=5.372e+01, percent-clipped=0.0 2024-08-12 21:29:51,521 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-12 21:29:52,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1836280.0, ans=0.0 2024-08-12 21:29:58,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1836280.0, ans=0.0 2024-08-12 21:30:28,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1836480.0, ans=0.125 2024-08-12 21:30:30,597 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9750, loss[loss=0.1068, beats_loss=0.01008, ecapa_loss=0.0001752, whisper_loss=0.09494, over 22869.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.0001721, whisper_loss=0.09123, over 3824363.40 frames. ], batch size: 90, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:30:33,092 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.52 vs. limit=22.5 2024-08-12 21:30:54,388 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 21:31:03,701 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2024-08-12 21:31:04,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1836780.0, ans=0.0 2024-08-12 21:31:08,458 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 21:31:27,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1836980.0, ans=0.0 2024-08-12 21:31:33,939 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 21:31:36,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1836980.0, ans=0.125 2024-08-12 21:31:42,701 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9800, loss[loss=0.127, beats_loss=0.008158, ecapa_loss=0.0002004, whisper_loss=0.1169, over 19564.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01084, ecapa_loss=0.0001706, whisper_loss=0.091, over 3836436.95 frames. ], batch size: 74, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:31:45,715 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 19 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-12 21:31:48,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1837080.0, ans=0.0 2024-08-12 21:32:12,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.082e+01 2.453e+01 2.781e+01 3.151e+01 8.550e+01, threshold=5.562e+01, percent-clipped=1.0 2024-08-12 21:32:21,946 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 21:32:22,514 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-12 21:32:27,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2024-08-12 21:32:32,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1837380.0, ans=0.125 2024-08-12 21:32:42,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1837480.0, ans=0.125 2024-08-12 21:32:55,397 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9850, loss[loss=0.1179, beats_loss=0.01148, ecapa_loss=0.0001855, whisper_loss=0.1046, over 16590.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01079, ecapa_loss=0.000172, whisper_loss=0.09178, over 3841584.84 frames. ], batch size: 68, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:32:56,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1837580.0, ans=0.125 2024-08-12 21:33:06,652 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.83 vs. limit=15.0 2024-08-12 21:33:11,528 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 21:33:19,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1837680.0, ans=0.125 2024-08-12 21:33:39,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1837880.0, ans=0.125 2024-08-12 21:33:39,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1837880.0, ans=0.125 2024-08-12 21:33:41,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1837880.0, ans=0.0 2024-08-12 21:33:51,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1837980.0, ans=0.125 2024-08-12 21:33:59,502 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 21:33:59,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1837980.0, ans=0.1 2024-08-12 21:34:03,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.62 vs. limit=10.0 2024-08-12 21:34:06,975 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9900, loss[loss=0.08915, beats_loss=0.01348, ecapa_loss=0.0001027, whisper_loss=0.07464, over 14315.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01084, ecapa_loss=0.000171, whisper_loss=0.09151, over 3826426.13 frames. ], batch size: 54, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:34:19,147 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-12 21:34:36,933 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.567e+01 2.799e+01 3.140e+01 5.231e+01, threshold=5.598e+01, percent-clipped=0.0 2024-08-12 21:34:46,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1838280.0, ans=10.0 2024-08-12 21:34:49,485 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 21:34:52,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1838380.0, ans=0.0 2024-08-12 21:34:53,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1838380.0, ans=0.1 2024-08-12 21:35:11,379 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 21:35:20,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1838580.0, ans=0.5 2024-08-12 21:35:21,876 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 9950, loss[loss=0.08908, beats_loss=0.01333, ecapa_loss=0.0001388, whisper_loss=0.07436, over 14117.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01079, ecapa_loss=0.0001729, whisper_loss=0.09202, over 3851693.32 frames. ], batch size: 55, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:35:23,455 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 21:35:33,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1838580.0, ans=0.0 2024-08-12 21:36:13,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=12.0 2024-08-12 21:36:14,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1838880.0, ans=0.025 2024-08-12 21:36:16,272 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 21:36:18,347 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2024-08-12 21:36:28,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1838980.0, ans=0.125 2024-08-12 21:36:36,550 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10000, loss[loss=0.1147, beats_loss=0.01196, ecapa_loss=0.0001701, whisper_loss=0.101, over 17884.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01081, ecapa_loss=0.0001735, whisper_loss=0.09253, over 3848752.91 frames. ], batch size: 71, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:36:44,156 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 21:36:59,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1839180.0, ans=0.1 2024-08-12 21:37:06,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.540e+01 2.812e+01 3.144e+01 2.734e+02, threshold=5.624e+01, percent-clipped=2.0 2024-08-12 21:37:16,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1839280.0, ans=0.2 2024-08-12 21:37:17,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1839280.0, ans=0.07 2024-08-12 21:37:48,447 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10050, loss[loss=0.0795, beats_loss=0.01522, ecapa_loss=0.0001428, whisper_loss=0.06285, over 20317.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01085, ecapa_loss=0.0001736, whisper_loss=0.09199, over 3880265.46 frames. ], batch size: 82, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:37:57,707 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 21:38:07,190 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.14 vs. limit=6.0 2024-08-12 21:38:17,869 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 21:38:24,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1839780.0, ans=0.0 2024-08-12 21:38:31,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1839780.0, ans=0.125 2024-08-12 21:38:32,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1839780.0, ans=0.2 2024-08-12 21:39:03,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1839980.0, ans=0.2 2024-08-12 21:39:12,402 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10100, loss[loss=0.09918, beats_loss=0.01263, ecapa_loss=0.0001164, whisper_loss=0.08539, over 23953.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01087, ecapa_loss=0.0001718, whisper_loss=0.09205, over 3897860.83 frames. ], batch size: 93, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:39:23,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1840080.0, ans=0.0 2024-08-12 21:39:25,561 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 31 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 21:39:40,853 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-12 21:39:45,286 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.534e+01 2.755e+01 3.172e+01 9.610e+01, threshold=5.510e+01, percent-clipped=1.0 2024-08-12 21:39:45,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1840280.0, ans=0.0 2024-08-12 21:39:45,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.44 vs. limit=22.5 2024-08-12 21:40:19,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1840480.0, ans=0.1 2024-08-12 21:40:27,779 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 21:40:34,876 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10150, loss[loss=0.1084, beats_loss=0.008942, ecapa_loss=0.0001744, whisper_loss=0.09769, over 18340.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01086, ecapa_loss=0.0001734, whisper_loss=0.09154, over 3900928.51 frames. ], batch size: 73, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:40:41,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1840580.0, ans=0.125 2024-08-12 21:40:53,076 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-12 21:41:07,088 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 21:41:21,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1840780.0, ans=0.1 2024-08-12 21:41:32,162 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-12 21:41:34,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1840880.0, ans=0.1 2024-08-12 21:41:56,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1840980.0, ans=0.1 2024-08-12 21:42:08,509 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10200, loss[loss=0.09512, beats_loss=0.01094, ecapa_loss=0.0002009, whisper_loss=0.08218, over 19635.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01091, ecapa_loss=0.0001728, whisper_loss=0.09068, over 3898292.03 frames. ], batch size: 84, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:42:20,440 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-12 21:42:53,949 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.450e+01 2.670e+01 3.042e+01 4.548e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-12 21:43:02,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1841280.0, ans=0.125 2024-08-12 21:43:14,182 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 21:43:26,755 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 21:43:52,307 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=22.5 2024-08-12 21:43:57,402 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10250, loss[loss=0.09685, beats_loss=0.01003, ecapa_loss=0.000166, whisper_loss=0.08516, over 22370.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01086, ecapa_loss=0.0001728, whisper_loss=0.09126, over 3916360.46 frames. ], batch size: 89, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:44:00,119 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 21:44:12,283 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 21:44:12,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1841580.0, ans=0.0 2024-08-12 21:44:21,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1841680.0, ans=0.125 2024-08-12 21:44:25,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1841680.0, ans=0.025 2024-08-12 21:44:50,090 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2024-08-12 21:44:58,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1841780.0, ans=0.125 2024-08-12 21:45:08,955 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-12 21:45:10,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1841880.0, ans=0.125 2024-08-12 21:45:15,937 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.02 vs. limit=10.0 2024-08-12 21:45:47,180 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10300, loss[loss=0.09722, beats_loss=0.01069, ecapa_loss=0.0001941, whisper_loss=0.08459, over 21880.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01095, ecapa_loss=0.0001719, whisper_loss=0.09074, over 3930858.73 frames. ], batch size: 92, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:46:26,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1842180.0, ans=0.1 2024-08-12 21:46:37,720 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.506e+01 2.751e+01 3.160e+01 4.441e+01, threshold=5.501e+01, percent-clipped=0.0 2024-08-12 21:46:54,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1842280.0, ans=0.2 2024-08-12 21:47:19,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1842480.0, ans=0.2 2024-08-12 21:47:33,632 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10350, loss[loss=0.09808, beats_loss=0.01179, ecapa_loss=0.0001422, whisper_loss=0.08487, over 22757.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01099, ecapa_loss=0.0001718, whisper_loss=0.0902, over 3925251.69 frames. ], batch size: 93, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:47:34,161 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.768e-01 2024-08-12 21:47:39,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1842580.0, ans=0.1 2024-08-12 21:47:55,306 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 21:47:55,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1842680.0, ans=0.0 2024-08-12 21:48:11,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1842780.0, ans=0.0 2024-08-12 21:48:37,304 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 15 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 21:48:37,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1842980.0, ans=0.0 2024-08-12 21:48:45,870 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10400, loss[loss=0.1066, beats_loss=0.01032, ecapa_loss=0.0001709, whisper_loss=0.09458, over 20480.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.011, ecapa_loss=0.0001718, whisper_loss=0.09022, over 3904569.50 frames. ], batch size: 83, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:48:49,139 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 21:48:52,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1843080.0, ans=0.125 2024-08-12 21:48:52,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1843080.0, ans=0.0 2024-08-12 21:48:54,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1843080.0, ans=0.0 2024-08-12 21:49:11,239 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 15 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-12 21:49:16,966 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.438e+01 2.753e+01 3.076e+01 5.598e+01, threshold=5.505e+01, percent-clipped=1.0 2024-08-12 21:49:21,135 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 21:49:22,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1843280.0, ans=0.1 2024-08-12 21:49:26,079 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.20 vs. limit=22.5 2024-08-12 21:49:28,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1843380.0, ans=0.1 2024-08-12 21:49:35,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-12 21:49:38,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1843380.0, ans=0.125 2024-08-12 21:49:47,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1843480.0, ans=0.125 2024-08-12 21:49:47,997 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-12 21:49:48,424 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.533e-01 2024-08-12 21:49:50,380 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2024-08-12 21:49:53,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1843480.0, ans=0.125 2024-08-12 21:49:59,581 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10450, loss[loss=0.08593, beats_loss=0.0121, ecapa_loss=0.0001701, whisper_loss=0.07213, over 21227.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01099, ecapa_loss=0.0001707, whisper_loss=0.0902, over 3884923.52 frames. ], batch size: 87, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:50:06,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1843580.0, ans=0.125 2024-08-12 21:50:10,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1843580.0, ans=0.0 2024-08-12 21:50:20,618 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.66 vs. limit=22.5 2024-08-12 21:50:21,573 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 21:50:35,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2024-08-12 21:50:45,791 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 21:51:03,076 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 21:51:03,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1843980.0, ans=0.0 2024-08-12 21:51:09,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1843980.0, ans=0.125 2024-08-12 21:51:10,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1843980.0, ans=0.1 2024-08-12 21:51:14,310 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10500, loss[loss=0.1032, beats_loss=0.01153, ecapa_loss=0.0001757, whisper_loss=0.08994, over 23387.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01093, ecapa_loss=0.0001705, whisper_loss=0.09019, over 3857652.51 frames. ], batch size: 96, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:51:19,344 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 21:51:27,517 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.81 vs. limit=10.0 2024-08-12 21:51:30,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1844180.0, ans=0.125 2024-08-12 21:51:32,407 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 21:51:32,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1844180.0, ans=0.0 2024-08-12 21:51:45,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.358e+01 2.688e+01 3.093e+01 1.105e+02, threshold=5.376e+01, percent-clipped=1.0 2024-08-12 21:51:56,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1844280.0, ans=0.125 2024-08-12 21:52:20,234 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 21:52:22,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1844480.0, ans=0.1 2024-08-12 21:52:30,762 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10550, loss[loss=0.1022, beats_loss=0.00982, ecapa_loss=0.0002062, whisper_loss=0.09034, over 18236.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01089, ecapa_loss=0.0001716, whisper_loss=0.09049, over 3848555.43 frames. ], batch size: 75, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:52:32,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1844580.0, ans=0.125 2024-08-12 21:53:09,570 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 21:53:19,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1844880.0, ans=0.1 2024-08-12 21:53:43,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1844980.0, ans=0.0 2024-08-12 21:53:48,909 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10600, loss[loss=0.1054, beats_loss=0.01006, ecapa_loss=0.0001639, whisper_loss=0.09366, over 18779.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01092, ecapa_loss=0.0001711, whisper_loss=0.09067, over 3844293.54 frames. ], batch size: 74, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:54:06,820 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 21:54:11,288 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 21:54:12,913 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2024-08-12 21:54:13,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1845180.0, ans=0.2 2024-08-12 21:54:21,077 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.510e+01 2.765e+01 3.245e+01 5.665e+01, threshold=5.530e+01, percent-clipped=1.0 2024-08-12 21:54:23,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1845280.0, ans=0.125 2024-08-12 21:54:30,525 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=22.5 2024-08-12 21:54:35,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1845380.0, ans=0.0 2024-08-12 21:54:35,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1845380.0, ans=0.125 2024-08-12 21:54:57,405 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 21:55:04,072 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10650, loss[loss=0.115, beats_loss=0.01181, ecapa_loss=0.0001522, whisper_loss=0.1017, over 23003.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.0001713, whisper_loss=0.09139, over 3842504.64 frames. ], batch size: 89, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:55:11,949 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 21:55:12,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1845580.0, ans=0.1 2024-08-12 21:55:39,599 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-12 21:55:44,802 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 35 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 21:56:23,591 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10700, loss[loss=0.1061, beats_loss=0.009164, ecapa_loss=0.0001442, whisper_loss=0.09545, over 17070.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01098, ecapa_loss=0.0001703, whisper_loss=0.0919, over 3875917.65 frames. ], batch size: 65, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:56:27,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1846080.0, ans=0.125 2024-08-12 21:56:34,680 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 21:56:39,027 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 38 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 21:56:51,648 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-12 21:56:55,465 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.619e+01 2.989e+01 3.264e+01 5.454e+01, threshold=5.979e+01, percent-clipped=0.0 2024-08-12 21:56:55,725 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 21:57:25,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1846480.0, ans=0.5 2024-08-12 21:57:37,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1846480.0, ans=0.125 2024-08-12 21:57:40,241 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10750, loss[loss=0.1022, beats_loss=0.01096, ecapa_loss=0.0001426, whisper_loss=0.08977, over 18405.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01093, ecapa_loss=0.0001697, whisper_loss=0.09291, over 3877256.49 frames. ], batch size: 71, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:57:46,600 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2024-08-12 21:57:48,856 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 21:58:01,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1846680.0, ans=0.0 2024-08-12 21:58:01,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1846680.0, ans=0.125 2024-08-12 21:58:17,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1846780.0, ans=0.125 2024-08-12 21:58:21,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1846880.0, ans=0.025 2024-08-12 21:58:31,640 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 21:58:51,333 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2024-08-12 21:58:53,656 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10800, loss[loss=0.09491, beats_loss=0.01083, ecapa_loss=0.0001732, whisper_loss=0.08235, over 15341.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0109, ecapa_loss=0.0001706, whisper_loss=0.09269, over 3865858.52 frames. ], batch size: 61, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:59:23,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.464e+01 2.831e+01 3.292e+01 5.711e+01, threshold=5.661e+01, percent-clipped=0.0 2024-08-12 21:59:49,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2024-08-12 21:59:52,049 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 21:59:54,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1847480.0, ans=0.0 2024-08-12 21:59:59,437 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 30 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 22:00:03,717 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 22:00:05,179 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10850, loss[loss=0.0833, beats_loss=0.01304, ecapa_loss=0.0001373, whisper_loss=0.06889, over 15537.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01098, ecapa_loss=0.0001707, whisper_loss=0.09212, over 3896259.72 frames. ], batch size: 62, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:00:32,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1847780.0, ans=0.125 2024-08-12 22:00:43,610 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 22:00:52,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1847880.0, ans=10.0 2024-08-12 22:00:53,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1847880.0, ans=0.125 2024-08-12 22:00:54,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1847880.0, ans=0.125 2024-08-12 22:01:04,545 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-12 22:01:11,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2024-08-12 22:01:16,588 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.87 vs. limit=12.0 2024-08-12 22:01:16,679 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.02 vs. limit=6.0 2024-08-12 22:01:17,076 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10900, loss[loss=0.1245, beats_loss=0.009576, ecapa_loss=0.0001743, whisper_loss=0.1132, over 22583.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01098, ecapa_loss=0.0001711, whisper_loss=0.0929, over 3919331.04 frames. ], batch size: 88, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:01:22,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1848080.0, ans=0.2 2024-08-12 22:01:47,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1848280.0, ans=0.125 2024-08-12 22:01:48,372 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.539e+01 2.752e+01 3.152e+01 5.586e+01, threshold=5.505e+01, percent-clipped=0.0 2024-08-12 22:01:58,918 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 22:02:02,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1848380.0, ans=0.05 2024-08-12 22:02:06,165 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 22:02:08,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1848380.0, ans=0.2 2024-08-12 22:02:15,597 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 22:02:25,033 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 22:02:27,470 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 22:02:32,066 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 10950, loss[loss=0.1212, beats_loss=0.01098, ecapa_loss=0.0001247, whisper_loss=0.1089, over 23668.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01092, ecapa_loss=0.0001706, whisper_loss=0.09332, over 3911536.56 frames. ], batch size: 87, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:02:42,575 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.97 vs. limit=10.0 2024-08-12 22:02:47,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1848680.0, ans=0.125 2024-08-12 22:03:01,954 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 22:03:03,284 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 39 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 22:03:03,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1848780.0, ans=0.0 2024-08-12 22:03:24,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1848880.0, ans=0.125 2024-08-12 22:03:32,595 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 22:03:47,203 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11000, loss[loss=0.07593, beats_loss=0.01063, ecapa_loss=0.0002139, whisper_loss=0.06316, over 16121.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01089, ecapa_loss=0.0001712, whisper_loss=0.093, over 3952312.62 frames. ], batch size: 70, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:03:47,473 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 22:03:57,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1849080.0, ans=0.125 2024-08-12 22:04:03,262 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 22:04:09,351 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-12 22:04:12,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1849180.0, ans=0.125 2024-08-12 22:04:18,788 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.465e+01 2.797e+01 3.199e+01 6.867e+01, threshold=5.594e+01, percent-clipped=1.0 2024-08-12 22:04:37,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1849380.0, ans=0.1 2024-08-12 22:04:58,873 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11050, loss[loss=0.1006, beats_loss=0.01034, ecapa_loss=0.0001373, whisper_loss=0.08893, over 15482.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.0001711, whisper_loss=0.09199, over 3940055.20 frames. ], batch size: 55, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:05:02,996 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 19 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-12 22:05:03,816 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.07 vs. limit=10.0 2024-08-12 22:05:23,166 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 22:05:28,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1849780.0, ans=0.125 2024-08-12 22:06:11,786 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11100, loss[loss=0.1127, beats_loss=0.008365, ecapa_loss=0.0002106, whisper_loss=0.1022, over 21513.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01084, ecapa_loss=0.0001717, whisper_loss=0.0923, over 3916279.43 frames. ], batch size: 87, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:06:29,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1850180.0, ans=0.125 2024-08-12 22:06:44,663 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.458e+01 2.677e+01 3.068e+01 5.581e+01, threshold=5.354e+01, percent-clipped=0.0 2024-08-12 22:06:52,489 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-12 22:06:59,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1850380.0, ans=0.035 2024-08-12 22:07:26,807 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11150, loss[loss=0.1035, beats_loss=0.009289, ecapa_loss=0.0001811, whisper_loss=0.09242, over 22077.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01078, ecapa_loss=0.0001712, whisper_loss=0.09278, over 3939178.37 frames. ], batch size: 88, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:07:34,266 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.57 vs. limit=15.0 2024-08-12 22:07:36,219 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 22:07:40,715 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 22:07:43,722 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 24 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 22:07:50,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2024-08-12 22:07:58,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1850780.0, ans=0.125 2024-08-12 22:08:12,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1850880.0, ans=10.0 2024-08-12 22:08:19,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1850880.0, ans=0.125 2024-08-12 22:08:21,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1850880.0, ans=0.125 2024-08-12 22:08:24,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1850880.0, ans=0.0 2024-08-12 22:08:41,522 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11200, loss[loss=0.1117, beats_loss=0.01102, ecapa_loss=0.0001429, whisper_loss=0.09926, over 23688.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01075, ecapa_loss=0.0001721, whisper_loss=0.09287, over 3930037.33 frames. ], batch size: 91, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:08:43,051 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 22:08:51,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1851080.0, ans=0.125 2024-08-12 22:08:55,608 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2024-08-12 22:08:59,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1851180.0, ans=0.125 2024-08-12 22:09:08,421 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 22:09:11,019 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.45 vs. limit=22.5 2024-08-12 22:09:14,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.512e+01 2.839e+01 3.173e+01 1.150e+02, threshold=5.678e+01, percent-clipped=1.0 2024-08-12 22:09:14,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1851280.0, ans=0.0 2024-08-12 22:09:44,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1851480.0, ans=0.125 2024-08-12 22:09:47,193 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 37 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 22:10:00,848 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11250, loss[loss=0.09735, beats_loss=0.009831, ecapa_loss=0.0001544, whisper_loss=0.08597, over 14472.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01077, ecapa_loss=0.0001715, whisper_loss=0.09257, over 3882597.20 frames. ], batch size: 55, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:10:02,234 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 26 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-12 22:10:19,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1851680.0, ans=0.95 2024-08-12 22:11:03,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1851980.0, ans=0.1 2024-08-12 22:11:17,121 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.88 vs. limit=15.0 2024-08-12 22:11:18,059 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11300, loss[loss=0.1071, beats_loss=0.01128, ecapa_loss=0.0001612, whisper_loss=0.09417, over 23301.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01075, ecapa_loss=0.0001704, whisper_loss=0.0927, over 3869877.63 frames. ], batch size: 93, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:11:41,919 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.236e-02 2024-08-12 22:11:54,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1852280.0, ans=0.0 2024-08-12 22:11:55,611 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.542e+01 2.832e+01 3.166e+01 7.074e+01, threshold=5.665e+01, percent-clipped=1.0 2024-08-12 22:12:24,807 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 22:12:25,324 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.20 vs. limit=15.0 2024-08-12 22:12:39,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1852580.0, ans=0.1 2024-08-12 22:12:40,704 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11350, loss[loss=0.1023, beats_loss=0.01341, ecapa_loss=0.0001525, whisper_loss=0.08734, over 17997.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0108, ecapa_loss=0.0001697, whisper_loss=0.09277, over 3884100.57 frames. ], batch size: 70, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:12:49,549 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 22:12:49,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1852580.0, ans=0.0 2024-08-12 22:13:20,163 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 24 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-12 22:13:37,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=12.0 2024-08-12 22:13:49,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1852980.0, ans=0.1 2024-08-12 22:13:54,708 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-12 22:14:01,994 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11400, loss[loss=0.1058, beats_loss=0.01146, ecapa_loss=0.0001525, whisper_loss=0.09282, over 19791.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01074, ecapa_loss=0.0001701, whisper_loss=0.09333, over 3878314.67 frames. ], batch size: 77, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:14:10,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1853080.0, ans=0.0 2024-08-12 22:14:31,807 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 22:14:36,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.651e+01 3.000e+01 3.420e+01 5.421e+01, threshold=6.000e+01, percent-clipped=0.0 2024-08-12 22:15:01,208 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 18 from LS+wenet, 28 from Vox, 47 fro AS 2024-08-12 22:15:16,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1853480.0, ans=0.0 2024-08-12 22:15:19,731 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11450, loss[loss=0.09376, beats_loss=0.01247, ecapa_loss=0.0001893, whisper_loss=0.0794, over 20417.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01076, ecapa_loss=0.0001689, whisper_loss=0.09381, over 3898770.88 frames. ], batch size: 87, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:15:20,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1853580.0, ans=0.0 2024-08-12 22:15:23,008 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 22:15:32,094 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.62 vs. limit=22.5 2024-08-12 22:15:33,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1853580.0, ans=0.2 2024-08-12 22:15:35,381 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.42 vs. limit=15.0 2024-08-12 22:16:04,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1853780.0, ans=22.5 2024-08-12 22:16:10,563 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.167e+00 2024-08-12 22:16:20,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1853880.0, ans=0.125 2024-08-12 22:16:41,169 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11500, loss[loss=0.08674, beats_loss=0.01176, ecapa_loss=0.0001731, whisper_loss=0.07325, over 20255.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01085, ecapa_loss=0.0001679, whisper_loss=0.09264, over 3878920.14 frames. ], batch size: 81, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:16:48,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1854080.0, ans=0.0 2024-08-12 22:16:52,238 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 22:17:14,927 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 22:17:17,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.445e+01 2.643e+01 2.952e+01 4.086e+01, threshold=5.286e+01, percent-clipped=0.0 2024-08-12 22:17:22,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1854280.0, ans=0.125 2024-08-12 22:17:28,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1854380.0, ans=0.125 2024-08-12 22:17:36,438 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 22:17:51,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1854480.0, ans=0.1 2024-08-12 22:17:54,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1854480.0, ans=0.125 2024-08-12 22:18:01,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1854580.0, ans=0.2 2024-08-12 22:18:03,956 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11550, loss[loss=0.1047, beats_loss=0.01048, ecapa_loss=0.0001435, whisper_loss=0.09278, over 15183.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01084, ecapa_loss=0.0001686, whisper_loss=0.09251, over 3874705.46 frames. ], batch size: 59, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:18:10,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1854580.0, ans=0.125 2024-08-12 22:18:15,207 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 22:18:22,479 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2024-08-12 22:18:41,184 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 22:18:50,613 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 22:18:55,272 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 22:19:05,726 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-12 22:19:12,192 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 22:19:24,470 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11600, loss[loss=0.1186, beats_loss=0.009606, ecapa_loss=0.0001626, whisper_loss=0.1073, over 23352.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01082, ecapa_loss=0.0001704, whisper_loss=0.09198, over 3889599.65 frames. ], batch size: 92, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:19:25,382 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2024-08-12 22:19:33,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1855080.0, ans=0.2 2024-08-12 22:19:34,807 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 22:19:42,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1855180.0, ans=0.125 2024-08-12 22:20:00,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.514e+01 2.737e+01 3.107e+01 4.746e+01, threshold=5.475e+01, percent-clipped=0.0 2024-08-12 22:20:13,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1855380.0, ans=0.125 2024-08-12 22:20:15,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1855380.0, ans=0.0 2024-08-12 22:20:18,425 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-12 22:20:19,327 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 22:20:21,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1855380.0, ans=0.07 2024-08-12 22:20:43,071 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11650, loss[loss=0.1243, beats_loss=0.009645, ecapa_loss=0.0001431, whisper_loss=0.1132, over 21073.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01081, ecapa_loss=0.0001703, whisper_loss=0.09278, over 3894356.07 frames. ], batch size: 79, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:20:46,965 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 21 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-12 22:20:54,635 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-12 22:21:17,629 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 22:21:20,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1855780.0, ans=0.125 2024-08-12 22:21:38,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1855880.0, ans=0.035 2024-08-12 22:21:44,915 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 22:22:03,153 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11700, loss[loss=0.1125, beats_loss=0.008625, ecapa_loss=0.0002004, whisper_loss=0.1019, over 21887.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01084, ecapa_loss=0.0001703, whisper_loss=0.0925, over 3895620.57 frames. ], batch size: 89, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:22:05,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1856080.0, ans=0.2 2024-08-12 22:22:06,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1856080.0, ans=0.125 2024-08-12 22:22:13,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1856080.0, ans=0.2 2024-08-12 22:22:23,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1856180.0, ans=0.125 2024-08-12 22:22:25,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1856180.0, ans=0.125 2024-08-12 22:22:29,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1856180.0, ans=0.125 2024-08-12 22:22:39,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.485e+01 2.712e+01 3.027e+01 7.497e+01, threshold=5.424e+01, percent-clipped=1.0 2024-08-12 22:22:50,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1856280.0, ans=0.125 2024-08-12 22:23:13,686 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.00 vs. limit=15.0 2024-08-12 22:23:27,307 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11750, loss[loss=0.1151, beats_loss=0.00816, ecapa_loss=0.0002091, whisper_loss=0.1049, over 16915.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.011, ecapa_loss=0.0001706, whisper_loss=0.09217, over 3885789.53 frames. ], batch size: 70, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:23:37,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1856580.0, ans=0.0 2024-08-12 22:24:01,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1856780.0, ans=0.125 2024-08-12 22:24:25,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1856880.0, ans=0.125 2024-08-12 22:24:27,112 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 37 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 22:24:29,846 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 22:24:30,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=1856980.0, ans=0.02 2024-08-12 22:24:31,496 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 22:24:38,696 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 22:24:43,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1856980.0, ans=0.5 2024-08-12 22:24:45,746 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11800, loss[loss=0.09832, beats_loss=0.01177, ecapa_loss=0.0001723, whisper_loss=0.08483, over 19335.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01102, ecapa_loss=0.0001708, whisper_loss=0.09196, over 3882934.70 frames. ], batch size: 79, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:25:01,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1857180.0, ans=0.125 2024-08-12 22:25:08,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1857180.0, ans=0.0 2024-08-12 22:25:20,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1857280.0, ans=0.125 2024-08-12 22:25:21,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.559e+01 2.833e+01 3.342e+01 5.764e+01, threshold=5.666e+01, percent-clipped=1.0 2024-08-12 22:25:46,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1857380.0, ans=0.125 2024-08-12 22:25:59,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1857480.0, ans=0.125 2024-08-12 22:26:06,709 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11850, loss[loss=0.1178, beats_loss=0.01142, ecapa_loss=0.000147, whisper_loss=0.105, over 14528.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01103, ecapa_loss=0.0001695, whisper_loss=0.09212, over 3892424.63 frames. ], batch size: 54, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:26:10,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1857580.0, ans=0.0 2024-08-12 22:26:21,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1857680.0, ans=0.125 2024-08-12 22:26:33,648 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-12 22:26:41,935 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2024-08-12 22:26:55,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1857880.0, ans=0.125 2024-08-12 22:27:08,529 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-12 22:27:15,845 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.643e-01 2024-08-12 22:27:22,086 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11900, loss[loss=0.0891, beats_loss=0.01238, ecapa_loss=0.0001643, whisper_loss=0.07508, over 20953.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.011, ecapa_loss=0.0001701, whisper_loss=0.09241, over 3893987.69 frames. ], batch size: 87, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:27:46,290 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 22:27:52,921 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.540e+01 2.783e+01 3.070e+01 4.680e+01, threshold=5.566e+01, percent-clipped=0.0 2024-08-12 22:28:00,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1858280.0, ans=0.0 2024-08-12 22:28:20,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1858480.0, ans=0.2 2024-08-12 22:28:22,996 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 17 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 22:28:25,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1858480.0, ans=0.0 2024-08-12 22:28:28,195 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 22:28:29,504 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 34 from Vox, 36 fro AS 2024-08-12 22:28:31,941 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 11950, loss[loss=0.1008, beats_loss=0.01129, ecapa_loss=0.0001552, whisper_loss=0.08795, over 23204.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01093, ecapa_loss=0.0001722, whisper_loss=0.09202, over 3854432.49 frames. ], batch size: 90, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:28:38,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1858580.0, ans=0.125 2024-08-12 22:28:47,515 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 27 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-12 22:29:29,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1858980.0, ans=0.125 2024-08-12 22:29:38,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1859080.0, ans=0.2 2024-08-12 22:29:39,562 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12000, loss[loss=0.07706, beats_loss=0.009825, ecapa_loss=0.000173, whisper_loss=0.0655, over 14808.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01092, ecapa_loss=0.0001711, whisper_loss=0.09235, over 3877190.99 frames. ], batch size: 58, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:29:39,562 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 22:30:19,841 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on ASR_libri: loss=0.2562, beats_loss=0, ecapa_loss=0.0005805, whisper_loss=0.2504, over 922467.00 frames. 2024-08-12 22:30:37,787 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on SV_voxceleb1: loss=0.004691, beats_loss=0, ecapa_loss=0.0004691, whisper_loss=0, over 939242.00 frames. 2024-08-12 22:31:51,229 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4481, 3.0024, 3.1248, 2.8705], device='cuda:1') 2024-08-12 22:32:33,532 INFO [train_multi_KD3.py:1149] (1/4) Epoch 13, validation on AT_audioset: loss=0.02411, beats_loss=0.02411, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 22:32:33,535 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 22:32:45,966 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.14 vs. limit=10.0 2024-08-12 22:32:51,438 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 22:33:03,068 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-12 22:33:03,879 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 22:33:04,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.535e+01 2.857e+01 3.270e+01 5.667e+01, threshold=5.714e+01, percent-clipped=0.0 2024-08-12 22:33:11,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1859280.0, ans=0.0 2024-08-12 22:33:13,750 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.53 vs. limit=6.0 2024-08-12 22:33:18,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1859380.0, ans=0.125 2024-08-12 22:33:23,580 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-12 22:33:25,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1859380.0, ans=0.0 2024-08-12 22:33:29,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1859380.0, ans=0.125 2024-08-12 22:33:35,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1859480.0, ans=0.125 2024-08-12 22:33:46,415 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12050, loss[loss=0.1013, beats_loss=0.01207, ecapa_loss=0.0001462, whisper_loss=0.0878, over 20368.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01095, ecapa_loss=0.0001706, whisper_loss=0.09176, over 3839133.70 frames. ], batch size: 79, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:34:07,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1859680.0, ans=0.125 2024-08-12 22:34:14,966 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-12 22:34:27,705 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 22:34:46,045 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 22:34:53,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1859980.0, ans=0.125 2024-08-12 22:34:57,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1860080.0, ans=0.1 2024-08-12 22:34:58,258 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12100, loss[loss=0.1174, beats_loss=0.01052, ecapa_loss=0.0001521, whisper_loss=0.1054, over 23661.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0109, ecapa_loss=0.0001713, whisper_loss=0.09224, over 3823830.51 frames. ], batch size: 90, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:35:04,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1860080.0, ans=0.125 2024-08-12 22:35:06,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1860080.0, ans=0.125 2024-08-12 22:35:25,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1860280.0, ans=0.125 2024-08-12 22:35:26,031 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2024-08-12 22:35:27,871 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.529e+01 2.799e+01 3.028e+01 6.026e+01, threshold=5.598e+01, percent-clipped=1.0 2024-08-12 22:35:28,112 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 22:35:39,889 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 18 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 22:35:42,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1860380.0, ans=0.1 2024-08-12 22:35:46,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1860380.0, ans=0.125 2024-08-12 22:35:51,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1860380.0, ans=15.0 2024-08-12 22:35:54,026 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-12 22:35:54,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1860480.0, ans=0.125 2024-08-12 22:35:55,822 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 22:36:08,252 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12150, loss[loss=0.1033, beats_loss=0.009518, ecapa_loss=0.000148, whisper_loss=0.09228, over 15319.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01084, ecapa_loss=0.0001701, whisper_loss=0.09207, over 3811069.99 frames. ], batch size: 56, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:36:09,828 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 22:36:11,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1860580.0, ans=0.2 2024-08-12 22:36:12,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1860580.0, ans=0.1 2024-08-12 22:36:14,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1860580.0, ans=0.035 2024-08-12 22:36:16,727 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 22:36:17,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-08-12 22:36:26,169 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 22:36:36,702 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.71 vs. limit=22.5 2024-08-12 22:36:37,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1860780.0, ans=0.125 2024-08-12 22:36:43,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1860780.0, ans=0.0 2024-08-12 22:36:51,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1860880.0, ans=0.125 2024-08-12 22:36:55,404 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-12 22:36:55,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1860880.0, ans=0.05 2024-08-12 22:37:04,943 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 22:37:13,483 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 22:37:18,664 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12200, loss[loss=0.0849, beats_loss=0.01269, ecapa_loss=0.000182, whisper_loss=0.07038, over 15637.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0109, ecapa_loss=0.0001698, whisper_loss=0.09105, over 3790418.94 frames. ], batch size: 65, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:37:40,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1861180.0, ans=0.0 2024-08-12 22:37:49,469 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.525e+01 2.741e+01 3.168e+01 5.471e+01, threshold=5.482e+01, percent-clipped=0.0 2024-08-12 22:38:11,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1861380.0, ans=0.2 2024-08-12 22:38:20,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1861480.0, ans=0.0 2024-08-12 22:38:29,550 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12250, loss[loss=0.08589, beats_loss=0.01255, ecapa_loss=0.0001908, whisper_loss=0.07143, over 19083.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001706, whisper_loss=0.09167, over 3806016.65 frames. ], batch size: 79, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:38:32,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1861580.0, ans=0.125 2024-08-12 22:38:33,843 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 22:38:46,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1861680.0, ans=0.125 2024-08-12 22:38:53,085 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2024-08-12 22:39:08,410 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 22:39:08,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=1861780.0, ans=0.1 2024-08-12 22:39:18,788 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 22:39:19,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1861880.0, ans=0.125 2024-08-12 22:39:27,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1861980.0, ans=0.0 2024-08-12 22:39:40,998 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12300, loss[loss=0.1197, beats_loss=0.01141, ecapa_loss=0.0001625, whisper_loss=0.1066, over 22935.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.000172, whisper_loss=0.09159, over 3834796.74 frames. ], batch size: 88, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:39:44,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1862080.0, ans=0.125 2024-08-12 22:39:47,632 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 13 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 22:39:50,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1862080.0, ans=0.125 2024-08-12 22:39:52,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1862080.0, ans=0.125 2024-08-12 22:39:57,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1862180.0, ans=0.0 2024-08-12 22:40:04,530 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 22:40:11,035 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.506e+01 2.717e+01 3.049e+01 5.234e+01, threshold=5.434e+01, percent-clipped=0.0 2024-08-12 22:40:23,500 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-12 22:40:40,694 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 22:40:48,576 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12350, loss[loss=0.1185, beats_loss=0.01033, ecapa_loss=0.000166, whisper_loss=0.1065, over 21486.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01084, ecapa_loss=0.0001726, whisper_loss=0.09205, over 3870738.05 frames. ], batch size: 83, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:40:49,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1862580.0, ans=0.1 2024-08-12 22:40:55,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1862580.0, ans=0.125 2024-08-12 22:41:07,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1862680.0, ans=0.125 2024-08-12 22:41:07,876 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2024-08-12 22:41:12,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1862680.0, ans=0.125 2024-08-12 22:41:16,830 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-12 22:41:24,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1862780.0, ans=0.125 2024-08-12 22:41:30,390 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 22:41:31,806 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 22:41:42,627 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-08-12 22:41:44,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1862980.0, ans=0.125 2024-08-12 22:41:58,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1863080.0, ans=0.1 2024-08-12 22:41:59,257 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12400, loss[loss=0.08831, beats_loss=0.01255, ecapa_loss=0.000147, whisper_loss=0.07428, over 16352.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01087, ecapa_loss=0.0001706, whisper_loss=0.09225, over 3877531.01 frames. ], batch size: 64, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:42:26,674 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-12 22:42:29,665 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.619e+01 2.853e+01 3.347e+01 1.216e+02, threshold=5.705e+01, percent-clipped=2.0 2024-08-12 22:42:29,833 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 16 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 22:42:43,454 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-12 22:42:54,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1863480.0, ans=0.125 2024-08-12 22:42:56,948 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 22:43:08,996 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12450, loss[loss=0.07091, beats_loss=0.01386, ecapa_loss=0.0001461, whisper_loss=0.0556, over 16274.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01088, ecapa_loss=0.0001717, whisper_loss=0.09183, over 3851209.49 frames. ], batch size: 66, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:43:10,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1863580.0, ans=0.04949747468305833 2024-08-12 22:43:21,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1863680.0, ans=0.0 2024-08-12 22:43:38,900 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 22:43:57,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1863880.0, ans=0.2 2024-08-12 22:44:06,942 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 22:44:10,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1863980.0, ans=0.125 2024-08-12 22:44:11,097 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-12 22:44:12,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1863980.0, ans=0.0 2024-08-12 22:44:19,532 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12500, loss[loss=0.09685, beats_loss=0.01235, ecapa_loss=0.0001445, whisper_loss=0.08305, over 21646.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01089, ecapa_loss=0.0001706, whisper_loss=0.09164, over 3885363.95 frames. ], batch size: 87, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:44:19,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1864080.0, ans=0.125 2024-08-12 22:44:25,452 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-12 22:44:45,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1864280.0, ans=0.1 2024-08-12 22:44:48,291 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 19 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 22:44:49,301 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.439e+01 2.730e+01 3.074e+01 7.978e+01, threshold=5.460e+01, percent-clipped=1.0 2024-08-12 22:44:50,733 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 22:44:52,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1864280.0, ans=0.0 2024-08-12 22:44:56,909 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 22:45:17,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1864480.0, ans=0.125 2024-08-12 22:45:26,508 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12550, loss[loss=0.1008, beats_loss=0.009919, ecapa_loss=0.0002115, whisper_loss=0.08875, over 21807.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001702, whisper_loss=0.09155, over 3865306.32 frames. ], batch size: 92, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:45:40,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1864680.0, ans=0.1 2024-08-12 22:46:01,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1864780.0, ans=0.1 2024-08-12 22:46:05,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1864880.0, ans=0.125 2024-08-12 22:46:09,803 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 22:46:12,176 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 22:46:25,762 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 22:46:30,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1864980.0, ans=0.125 2024-08-12 22:46:31,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1864980.0, ans=0.125 2024-08-12 22:46:32,493 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 22:46:33,506 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12600, loss[loss=0.09831, beats_loss=0.0105, ecapa_loss=0.0002185, whisper_loss=0.08562, over 14722.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01097, ecapa_loss=0.0001703, whisper_loss=0.09165, over 3868344.58 frames. ], batch size: 62, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:46:43,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1865080.0, ans=0.04949747468305833 2024-08-12 22:47:03,522 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.534e+01 2.817e+01 3.269e+01 5.497e+01, threshold=5.633e+01, percent-clipped=1.0 2024-08-12 22:47:10,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1865280.0, ans=0.0 2024-08-12 22:47:12,750 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.86 vs. limit=15.0 2024-08-12 22:47:19,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1865380.0, ans=0.1 2024-08-12 22:47:28,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1865480.0, ans=0.125 2024-08-12 22:47:38,133 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 22:47:42,013 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12650, loss[loss=0.1165, beats_loss=0.01101, ecapa_loss=0.0001572, whisper_loss=0.1039, over 22831.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0111, ecapa_loss=0.0001686, whisper_loss=0.09094, over 3866873.94 frames. ], batch size: 90, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:48:04,603 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.32 vs. limit=10.0 2024-08-12 22:48:05,444 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 22:48:42,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1865980.0, ans=0.125 2024-08-12 22:48:43,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1865980.0, ans=0.125 2024-08-12 22:48:46,447 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-12 22:48:50,297 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12700, loss[loss=0.09284, beats_loss=0.01291, ecapa_loss=0.0001307, whisper_loss=0.07862, over 15566.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01114, ecapa_loss=0.000168, whisper_loss=0.09015, over 3834305.74 frames. ], batch size: 62, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:48:53,741 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.86 vs. limit=6.0 2024-08-12 22:48:59,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1866080.0, ans=0.125 2024-08-12 22:49:01,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1866080.0, ans=0.2 2024-08-12 22:49:08,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1866180.0, ans=0.0 2024-08-12 22:49:17,799 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-12 22:49:21,580 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.169e+01 2.468e+01 2.692e+01 3.051e+01 4.394e+01, threshold=5.384e+01, percent-clipped=0.0 2024-08-12 22:49:22,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1866280.0, ans=0.0 2024-08-12 22:49:24,555 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 22:49:30,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1866280.0, ans=0.125 2024-08-12 22:49:48,380 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.74 vs. limit=10.0 2024-08-12 22:49:59,654 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12750, loss[loss=0.1152, beats_loss=0.01043, ecapa_loss=0.0001793, whisper_loss=0.103, over 18627.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01113, ecapa_loss=0.00017, whisper_loss=0.09049, over 3841007.57 frames. ], batch size: 75, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:50:15,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1866680.0, ans=0.1 2024-08-12 22:50:24,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1866780.0, ans=0.125 2024-08-12 22:50:24,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1866780.0, ans=0.125 2024-08-12 22:50:31,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=12.0 2024-08-12 22:50:34,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1866780.0, ans=0.2 2024-08-12 22:50:37,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-12 22:50:42,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1866880.0, ans=0.125 2024-08-12 22:50:47,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1866880.0, ans=0.125 2024-08-12 22:50:51,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1866980.0, ans=0.2 2024-08-12 22:50:52,332 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 22:50:56,583 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 12 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 22:50:57,930 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 34 from Vox, 28 fro AS 2024-08-12 22:51:05,695 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12800, loss[loss=0.1099, beats_loss=0.01311, ecapa_loss=0.0001691, whisper_loss=0.09509, over 16143.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01122, ecapa_loss=0.0001693, whisper_loss=0.09013, over 3861261.13 frames. ], batch size: 66, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:51:21,139 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 31 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 22:51:25,167 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 11 from Vox, 49 fro AS 2024-08-12 22:51:35,458 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.415e+01 2.675e+01 2.893e+01 6.675e+01, threshold=5.350e+01, percent-clipped=1.0 2024-08-12 22:51:41,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=1867280.0, ans=0.2 2024-08-12 22:52:05,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1867480.0, ans=0.025 2024-08-12 22:52:13,118 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12850, loss[loss=0.1399, beats_loss=0.009122, ecapa_loss=0.0002035, whisper_loss=0.1288, over 22631.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01109, ecapa_loss=0.0001704, whisper_loss=0.09138, over 3877405.89 frames. ], batch size: 90, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:52:36,176 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-12 22:52:51,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1867780.0, ans=0.125 2024-08-12 22:53:08,754 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-12 22:53:23,057 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12900, loss[loss=0.1095, beats_loss=0.01229, ecapa_loss=0.0001397, whisper_loss=0.09582, over 22660.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01115, ecapa_loss=0.0001721, whisper_loss=0.09032, over 3871725.23 frames. ], batch size: 91, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:53:37,835 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=15.0 2024-08-12 22:53:47,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.65 vs. limit=10.0 2024-08-12 22:53:48,309 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 22:53:48,874 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-08-12 22:53:53,308 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.434e+01 2.743e+01 3.168e+01 4.693e+01, threshold=5.486e+01, percent-clipped=0.0 2024-08-12 22:53:56,478 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 22:54:06,895 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 22:54:09,380 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 22:54:11,614 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-12 22:54:24,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1868480.0, ans=0.0 2024-08-12 22:54:31,254 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 22:54:32,671 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 12950, loss[loss=0.09018, beats_loss=0.01134, ecapa_loss=0.0002173, whisper_loss=0.07667, over 18289.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01098, ecapa_loss=0.0001722, whisper_loss=0.09158, over 3886189.01 frames. ], batch size: 76, lr: 4.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 22:54:45,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1868680.0, ans=0.125 2024-08-12 22:54:55,394 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-12 22:55:09,093 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 22:55:13,391 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 22:55:22,880 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 25 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 22:55:27,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1868980.0, ans=0.1 2024-08-12 22:55:33,426 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-12 22:55:35,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1868980.0, ans=0.1 2024-08-12 22:55:38,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1869080.0, ans=0.5 2024-08-12 22:55:40,006 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13000, loss[loss=0.1116, beats_loss=0.009597, ecapa_loss=0.0001986, whisper_loss=0.1, over 16694.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01101, ecapa_loss=0.0001725, whisper_loss=0.09142, over 3891138.56 frames. ], batch size: 69, lr: 4.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 22:55:47,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1869080.0, ans=0.1 2024-08-12 22:55:49,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1869080.0, ans=0.07 2024-08-12 22:55:52,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1869180.0, ans=0.0 2024-08-12 22:55:52,630 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2024-08-12 22:55:54,848 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-12 22:56:09,165 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.487e+01 2.816e+01 3.426e+01 7.138e+01, threshold=5.633e+01, percent-clipped=2.0 2024-08-12 22:56:26,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1869380.0, ans=0.125 2024-08-12 22:56:27,785 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 16 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 22:56:30,090 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.12 vs. limit=15.0 2024-08-12 22:56:46,945 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13050, loss[loss=0.09834, beats_loss=0.01257, ecapa_loss=0.0001966, whisper_loss=0.0838, over 20951.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01102, ecapa_loss=0.0001715, whisper_loss=0.09086, over 3894360.75 frames. ], batch size: 90, lr: 4.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 22:56:56,964 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 22:57:00,247 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.58 vs. limit=6.0 2024-08-12 22:57:10,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1869680.0, ans=0.2 2024-08-12 22:57:11,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1869680.0, ans=0.0 2024-08-12 22:57:23,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1869780.0, ans=0.125 2024-08-12 22:57:32,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1869880.0, ans=0.2 2024-08-12 22:57:34,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1869880.0, ans=0.125 2024-08-12 22:57:51,846 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=15.0 2024-08-12 22:57:53,302 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13100, loss[loss=0.1036, beats_loss=0.01089, ecapa_loss=0.0002019, whisper_loss=0.09068, over 22238.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01096, ecapa_loss=0.0001726, whisper_loss=0.0911, over 3863431.08 frames. ], batch size: 92, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:57:53,460 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 22:58:11,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1870180.0, ans=0.0 2024-08-12 22:58:14,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1870180.0, ans=0.0 2024-08-12 22:58:22,832 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-12 22:58:23,883 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.487e+01 2.739e+01 3.111e+01 4.282e+01, threshold=5.479e+01, percent-clipped=0.0 2024-08-12 22:58:43,073 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 22:58:58,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1870480.0, ans=10.0 2024-08-12 22:59:00,274 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13150, loss[loss=0.08762, beats_loss=0.009774, ecapa_loss=0.0001795, whisper_loss=0.07605, over 20633.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01096, ecapa_loss=0.0001705, whisper_loss=0.09114, over 3861732.14 frames. ], batch size: 81, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:59:03,847 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2024-08-12 22:59:22,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1870680.0, ans=0.125 2024-08-12 22:59:50,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1870880.0, ans=0.0 2024-08-12 23:00:05,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1871080.0, ans=0.0 2024-08-12 23:00:06,655 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13200, loss[loss=0.0925, beats_loss=0.01244, ecapa_loss=0.0001566, whisper_loss=0.0785, over 19095.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01091, ecapa_loss=0.0001702, whisper_loss=0.09131, over 3850931.88 frames. ], batch size: 79, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:00:08,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1871080.0, ans=0.05 2024-08-12 23:00:19,199 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=22.5 2024-08-12 23:00:36,521 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.560e+01 2.764e+01 3.178e+01 9.126e+01, threshold=5.529e+01, percent-clipped=1.0 2024-08-12 23:00:58,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1871480.0, ans=0.125 2024-08-12 23:01:02,219 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 23 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-12 23:01:05,053 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-12 23:01:12,596 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13250, loss[loss=0.08778, beats_loss=0.01159, ecapa_loss=0.0001471, whisper_loss=0.07472, over 20329.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01083, ecapa_loss=0.0001715, whisper_loss=0.09138, over 3864528.03 frames. ], batch size: 79, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:01:14,109 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 23 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-12 23:01:14,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1871580.0, ans=0.1 2024-08-12 23:01:42,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1871780.0, ans=0.04949747468305833 2024-08-12 23:01:58,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1871880.0, ans=0.125 2024-08-12 23:02:08,067 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.37 vs. limit=10.0 2024-08-12 23:02:08,720 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 23:02:13,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2024-08-12 23:02:14,377 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 14 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 23:02:20,777 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13300, loss[loss=0.0965, beats_loss=0.012, ecapa_loss=0.0001755, whisper_loss=0.08274, over 19708.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01092, ecapa_loss=0.0001705, whisper_loss=0.09086, over 3828383.76 frames. ], batch size: 80, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:02:43,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1872180.0, ans=0.2 2024-08-12 23:02:52,183 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.491e+01 2.756e+01 2.982e+01 7.499e+01, threshold=5.512e+01, percent-clipped=1.0 2024-08-12 23:02:59,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1872280.0, ans=0.125 2024-08-12 23:03:04,784 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-12 23:03:11,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1872380.0, ans=0.125 2024-08-12 23:03:26,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1872480.0, ans=0.5 2024-08-12 23:03:27,266 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 23:03:28,683 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13350, loss[loss=0.1031, beats_loss=0.01143, ecapa_loss=0.0001693, whisper_loss=0.08999, over 16419.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0109, ecapa_loss=0.00017, whisper_loss=0.09116, over 3821781.36 frames. ], batch size: 65, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:03:37,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1872580.0, ans=0.0 2024-08-12 23:03:49,745 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-08-12 23:04:03,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1872780.0, ans=0.125 2024-08-12 23:04:10,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1872880.0, ans=0.125 2024-08-12 23:04:11,548 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 23:04:15,238 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 23:04:16,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1872880.0, ans=0.125 2024-08-12 23:04:24,799 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 23:04:26,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1872980.0, ans=0.125 2024-08-12 23:04:34,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1873080.0, ans=0.125 2024-08-12 23:04:35,478 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13400, loss[loss=0.1189, beats_loss=0.01035, ecapa_loss=0.0001823, whisper_loss=0.1067, over 22523.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.0001701, whisper_loss=0.09103, over 3858672.49 frames. ], batch size: 89, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:04:41,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1873080.0, ans=0.07 2024-08-12 23:04:49,030 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 23:04:49,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1873180.0, ans=0.125 2024-08-12 23:04:54,536 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 18 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-12 23:05:01,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1873280.0, ans=0.0 2024-08-12 23:05:06,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.402e+01 2.808e+01 3.201e+01 5.167e+01, threshold=5.616e+01, percent-clipped=0.0 2024-08-12 23:05:41,447 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13450, loss[loss=0.1013, beats_loss=0.0111, ecapa_loss=0.0001402, whisper_loss=0.0888, over 17459.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01092, ecapa_loss=0.0001707, whisper_loss=0.0909, over 3844844.81 frames. ], batch size: 65, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:06:00,385 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=15.0 2024-08-12 23:06:38,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1873980.0, ans=0.125 2024-08-12 23:06:42,519 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=15.0 2024-08-12 23:06:48,026 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13500, loss[loss=0.09964, beats_loss=0.007798, ecapa_loss=0.0002043, whisper_loss=0.0898, over 13844.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01086, ecapa_loss=0.0001721, whisper_loss=0.09119, over 3831298.61 frames. ], batch size: 55, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:06:48,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1874080.0, ans=0.125 2024-08-12 23:06:49,032 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-12 23:07:03,356 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 23:07:19,464 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.452e+01 2.723e+01 3.030e+01 4.696e+01, threshold=5.446e+01, percent-clipped=0.0 2024-08-12 23:07:55,299 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13550, loss[loss=0.09857, beats_loss=0.01177, ecapa_loss=0.0001671, whisper_loss=0.08513, over 22364.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01093, ecapa_loss=0.0001707, whisper_loss=0.09095, over 3858140.46 frames. ], batch size: 91, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:08:01,431 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.02 vs. limit=22.5 2024-08-12 23:08:12,532 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-12 23:08:15,360 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 23:08:16,656 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 23:08:20,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1874780.0, ans=0.125 2024-08-12 23:08:30,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1874780.0, ans=0.125 2024-08-12 23:08:31,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1874780.0, ans=0.1 2024-08-12 23:08:35,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1874880.0, ans=0.125 2024-08-12 23:08:37,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1874880.0, ans=0.1 2024-08-12 23:08:59,771 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 23:09:02,104 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13600, loss[loss=0.1154, beats_loss=0.008978, ecapa_loss=0.0001798, whisper_loss=0.1046, over 14790.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01096, ecapa_loss=0.0001713, whisper_loss=0.0908, over 3841528.69 frames. ], batch size: 58, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:09:08,079 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2024-08-12 23:09:10,776 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-12 23:09:13,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1875080.0, ans=0.1 2024-08-12 23:09:18,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1875180.0, ans=0.125 2024-08-12 23:09:21,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1875180.0, ans=0.125 2024-08-12 23:09:28,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1875280.0, ans=0.2 2024-08-12 23:09:30,778 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-12 23:09:32,505 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.462e+01 2.883e+01 3.310e+01 7.463e+01, threshold=5.766e+01, percent-clipped=1.0 2024-08-12 23:09:38,592 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.23 vs. limit=12.0 2024-08-12 23:09:42,617 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2024-08-12 23:09:44,109 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2024-08-12 23:09:47,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1875380.0, ans=0.125 2024-08-12 23:09:55,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1875480.0, ans=0.1 2024-08-12 23:09:55,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1875480.0, ans=0.2 2024-08-12 23:10:07,303 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13650, loss[loss=0.1003, beats_loss=0.009486, ecapa_loss=0.0002178, whisper_loss=0.08861, over 15424.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01099, ecapa_loss=0.0001725, whisper_loss=0.09115, over 3854980.80 frames. ], batch size: 65, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:10:15,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1875580.0, ans=0.125 2024-08-12 23:10:24,089 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-08-12 23:10:25,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1875680.0, ans=0.125 2024-08-12 23:10:43,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1875780.0, ans=0.035 2024-08-12 23:10:53,622 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-08-12 23:11:02,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1875980.0, ans=0.125 2024-08-12 23:11:10,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1875980.0, ans=0.0 2024-08-12 23:11:11,859 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 23:11:14,546 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13700, loss[loss=0.1047, beats_loss=0.01308, ecapa_loss=0.0001719, whisper_loss=0.08993, over 23423.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01096, ecapa_loss=0.0001722, whisper_loss=0.09172, over 3848472.58 frames. ], batch size: 95, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:11:19,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1876080.0, ans=0.125 2024-08-12 23:11:28,543 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 21 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-12 23:11:28,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1876180.0, ans=0.2 2024-08-12 23:11:32,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1876180.0, ans=0.0 2024-08-12 23:11:44,997 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.467e+01 2.777e+01 3.137e+01 6.258e+01, threshold=5.554e+01, percent-clipped=1.0 2024-08-12 23:11:47,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1876280.0, ans=0.2 2024-08-12 23:12:01,531 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 23:12:15,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1876480.0, ans=0.2 2024-08-12 23:12:21,823 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13750, loss[loss=0.1075, beats_loss=0.01249, ecapa_loss=0.0001205, whisper_loss=0.09379, over 15506.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.0001727, whisper_loss=0.09192, over 3860221.37 frames. ], batch size: 57, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:12:32,027 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 23:12:34,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.23 vs. limit=22.5 2024-08-12 23:12:38,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1876680.0, ans=0.125 2024-08-12 23:12:54,590 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-12 23:13:00,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1876780.0, ans=0.0 2024-08-12 23:13:03,601 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=12.0 2024-08-12 23:13:05,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1876880.0, ans=0.125 2024-08-12 23:13:24,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1876980.0, ans=0.0 2024-08-12 23:13:29,879 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 23:13:31,991 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13800, loss[loss=0.087, beats_loss=0.0137, ecapa_loss=0.0001364, whisper_loss=0.07193, over 22650.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01096, ecapa_loss=0.0001698, whisper_loss=0.09106, over 3820537.81 frames. ], batch size: 91, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:13:51,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-12 23:13:58,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1877180.0, ans=0.125 2024-08-12 23:14:01,004 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 23:14:06,713 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.452e+01 2.663e+01 3.049e+01 4.287e+01, threshold=5.326e+01, percent-clipped=0.0 2024-08-12 23:14:07,637 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=22.5 2024-08-12 23:14:11,668 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 23:14:16,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1877380.0, ans=0.2 2024-08-12 23:14:18,094 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 23:14:28,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1877380.0, ans=0.125 2024-08-12 23:14:33,383 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2024-08-12 23:14:47,592 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13850, loss[loss=0.1119, beats_loss=0.009423, ecapa_loss=0.0001597, whisper_loss=0.1009, over 18892.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01096, ecapa_loss=0.0001695, whisper_loss=0.09138, over 3844156.32 frames. ], batch size: 72, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:14:48,385 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-12 23:14:54,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1877580.0, ans=0.125 2024-08-12 23:14:55,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1877580.0, ans=0.2 2024-08-12 23:15:14,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1877680.0, ans=0.0 2024-08-12 23:15:45,083 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 23:16:04,690 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13900, loss[loss=0.1066, beats_loss=0.01012, ecapa_loss=0.0001644, whisper_loss=0.09485, over 17258.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01096, ecapa_loss=0.0001689, whisper_loss=0.09174, over 3867324.44 frames. ], batch size: 66, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:16:14,049 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2024-08-12 23:16:17,503 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.46 vs. limit=8.0 2024-08-12 23:16:18,324 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.09 vs. limit=22.5 2024-08-12 23:16:35,562 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-12 23:16:37,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1878280.0, ans=0.125 2024-08-12 23:16:39,681 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.486e+01 2.775e+01 2.978e+01 4.704e+01, threshold=5.551e+01, percent-clipped=0.0 2024-08-12 23:16:44,708 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2024-08-12 23:17:14,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1878480.0, ans=0.125 2024-08-12 23:17:19,956 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 13950, loss[loss=0.1085, beats_loss=0.009231, ecapa_loss=0.0001883, whisper_loss=0.09736, over 22501.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01088, ecapa_loss=0.0001702, whisper_loss=0.09131, over 3870752.93 frames. ], batch size: 93, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:17:31,521 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 23:17:38,552 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 11 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 23:17:43,073 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 23:18:29,450 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 23:18:35,366 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 14000, loss[loss=0.097, beats_loss=0.008889, ecapa_loss=0.0002162, whisper_loss=0.08595, over 17900.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01083, ecapa_loss=0.0001696, whisper_loss=0.09161, over 3861279.28 frames. ], batch size: 75, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:18:38,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1879080.0, ans=0.125 2024-08-12 23:19:09,754 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.518e+01 2.898e+01 3.200e+01 5.053e+01, threshold=5.795e+01, percent-clipped=0.0 2024-08-12 23:19:18,726 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 23:19:24,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1879380.0, ans=0.2 2024-08-12 23:19:29,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1879380.0, ans=0.125 2024-08-12 23:19:38,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1879480.0, ans=0.1 2024-08-12 23:19:51,647 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 14050, loss[loss=0.09653, beats_loss=0.01173, ecapa_loss=0.000146, whisper_loss=0.08334, over 18246.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01083, ecapa_loss=0.0001697, whisper_loss=0.09202, over 3843213.02 frames. ], batch size: 70, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:19:56,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1879580.0, ans=0.0 2024-08-12 23:20:10,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1879680.0, ans=0.0 2024-08-12 23:20:12,275 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 23:20:29,602 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 23:20:46,168 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 23:20:50,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1879980.0, ans=0.125 2024-08-12 23:21:05,271 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.31 vs. limit=12.0 2024-08-12 23:21:08,983 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 14100, loss[loss=0.1197, beats_loss=0.01085, ecapa_loss=0.0001402, whisper_loss=0.1074, over 23573.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01097, ecapa_loss=0.0001694, whisper_loss=0.09198, over 3868434.61 frames. ], batch size: 90, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:21:10,237 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2024-08-12 23:21:22,527 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 23:21:26,106 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-12 23:21:30,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1880180.0, ans=0.125 2024-08-12 23:21:34,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1880180.0, ans=0.0 2024-08-12 23:21:41,502 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 31 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 23:21:44,147 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.402e+01 2.759e+01 3.024e+01 5.678e+01, threshold=5.519e+01, percent-clipped=0.0 2024-08-12 23:21:46,137 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 32 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 23:21:52,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1880280.0, ans=0.5 2024-08-12 23:22:02,651 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-12 23:22:06,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1880380.0, ans=0.125 2024-08-12 23:22:12,061 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.34 vs. limit=22.5 2024-08-12 23:22:27,139 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 14150, loss[loss=0.0947, beats_loss=0.009204, ecapa_loss=0.0001516, whisper_loss=0.08398, over 14889.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01092, ecapa_loss=0.0001698, whisper_loss=0.09203, over 3829092.16 frames. ], batch size: 58, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:22:27,370 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 23:22:45,008 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-12 23:22:49,510 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-12 23:23:05,313 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.787e-01 2024-08-12 23:23:08,936 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 23:23:18,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1880880.0, ans=0.05 2024-08-12 23:23:39,592 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 23:23:46,676 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 14200, loss[loss=0.1087, beats_loss=0.009257, ecapa_loss=0.000194, whisper_loss=0.09752, over 21335.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0109, ecapa_loss=0.0001694, whisper_loss=0.09117, over 3856227.74 frames. ], batch size: 92, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:24:04,412 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 23:24:16,058 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2024-08-12 23:24:23,369 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 23:24:24,465 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.554e+01 2.881e+01 3.378e+01 7.854e+01, threshold=5.762e+01, percent-clipped=3.0 2024-08-12 23:24:46,732 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.53 vs. limit=15.0 2024-08-12 23:25:00,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1881480.0, ans=0.1 2024-08-12 23:25:07,538 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 14250, loss[loss=0.1223, beats_loss=0.007955, ecapa_loss=0.0001566, whisper_loss=0.1127, over 15884.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01093, ecapa_loss=0.0001677, whisper_loss=0.09207, over 3890517.83 frames. ], batch size: 61, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:25:09,190 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 23:25:33,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1881680.0, ans=0.125 2024-08-12 23:25:51,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1881880.0, ans=10.0 2024-08-12 23:25:53,272 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=12.0 2024-08-12 23:26:10,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1881980.0, ans=0.05 2024-08-12 23:26:13,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1881980.0, ans=0.1 2024-08-12 23:26:24,037 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 14300, loss[loss=0.1302, beats_loss=0.009098, ecapa_loss=0.0001459, whisper_loss=0.1196, over 25552.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01097, ecapa_loss=0.0001678, whisper_loss=0.09181, over 3899288.96 frames. ], batch size: 96, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:26:27,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1882080.0, ans=0.125 2024-08-12 23:26:36,501 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-12 23:26:58,879 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.532e+01 2.791e+01 3.195e+01 4.924e+01, threshold=5.583e+01, percent-clipped=0.0 2024-08-12 23:26:59,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1882280.0, ans=0.0 2024-08-12 23:27:00,034 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2024-08-12 23:27:00,890 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.00 vs. limit=22.5 2024-08-12 23:27:02,802 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 23:27:04,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1882280.0, ans=0.2 2024-08-12 23:27:06,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1882280.0, ans=0.07 2024-08-12 23:27:06,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1882280.0, ans=0.125 2024-08-12 23:27:10,399 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 23:27:12,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1882380.0, ans=0.125 2024-08-12 23:27:14,588 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-12 23:27:15,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1882380.0, ans=0.125 2024-08-12 23:27:17,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1882380.0, ans=0.125 2024-08-12 23:27:28,406 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2024-08-12 23:27:39,417 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 14350, loss[loss=0.1064, beats_loss=0.01283, ecapa_loss=0.000154, whisper_loss=0.09202, over 22203.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001682, whisper_loss=0.09184, over 3907126.86 frames. ], batch size: 87, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:27:39,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1882580.0, ans=0.0 2024-08-12 23:27:51,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1882580.0, ans=0.125 2024-08-12 23:27:58,899 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2024-08-12 23:28:04,822 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 23:28:20,788 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-08-12 23:28:23,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1882780.0, ans=0.0 2024-08-12 23:28:38,140 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.90 vs. limit=22.5 2024-08-12 23:28:45,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1882980.0, ans=0.0 2024-08-12 23:28:58,887 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 14400, loss[loss=0.1085, beats_loss=0.0114, ecapa_loss=0.0001539, whisper_loss=0.09554, over 13608.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01093, ecapa_loss=0.0001712, whisper_loss=0.09173, over 3905027.08 frames. ], batch size: 53, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:29:01,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1883080.0, ans=0.0 2024-08-12 23:29:10,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1883080.0, ans=0.0 2024-08-12 23:29:10,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1883080.0, ans=0.125 2024-08-12 23:29:15,097 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 23:29:17,924 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.10 vs. limit=22.5 2024-08-12 23:29:33,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.540e+01 2.866e+01 3.197e+01 2.206e+02, threshold=5.732e+01, percent-clipped=2.0 2024-08-12 23:29:35,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1883280.0, ans=0.1 2024-08-12 23:29:47,839 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.84 vs. limit=10.0 2024-08-12 23:30:14,225 INFO [train_multi_KD3.py:1116] (1/4) Epoch 13, batch 14450, loss[loss=0.08549, beats_loss=0.0133, ecapa_loss=0.0001716, whisper_loss=0.07047, over 21573.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001726, whisper_loss=0.0915, over 3877641.69 frames. ], batch size: 91, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:30:22,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1883580.0, ans=0.125 2024-08-12 23:30:37,330 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2024-08-12 23:30:44,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1883780.0, ans=0.125 2024-08-12 23:30:57,124 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-12 23:30:57,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1883780.0, ans=0.0 2024-08-12 23:30:57,653 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.82 vs. limit=22.5 2024-08-12 23:31:09,141 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-12 23:31:54,481 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 0, loss[loss=0.093, beats_loss=0.01278, ecapa_loss=0.0001824, whisper_loss=0.0784, over 17232.00 frames. ], tot_loss[loss=0.093, beats_loss=0.01278, ecapa_loss=0.0001824, whisper_loss=0.0784, over 17232.00 frames. ], batch size: 71, lr: 4.58e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:31:54,482 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-12 23:32:30,946 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on ASR_libri: loss=0.2554, beats_loss=0, ecapa_loss=0.0005808, whisper_loss=0.2496, over 922467.00 frames. 2024-08-12 23:32:45,391 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.0993e-02, 2.1354e-03, 3.7141e-03, 3.8168e+00, 3.8460e-03, 5.3238e-02, 4.0340e-02, 1.0868e-02], device='cuda:1') 2024-08-12 23:32:45,451 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7268, 2.8585, 2.9018, 2.7478], device='cuda:1') 2024-08-12 23:32:47,217 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on SV_voxceleb1: loss=0.004647, beats_loss=0, ecapa_loss=0.0004647, whisper_loss=0, over 939242.00 frames. 2024-08-12 23:34:33,373 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on AT_audioset: loss=0.02401, beats_loss=0.02401, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 23:34:33,376 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-12 23:35:05,896 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=12.0 2024-08-12 23:35:07,145 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 30 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 23:35:22,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1884190.0, ans=0.0 2024-08-12 23:35:27,505 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 23:35:48,204 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2024-08-12 23:35:51,196 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 23:35:53,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.601e+01 2.897e+01 3.214e+01 1.891e+02, threshold=5.795e+01, percent-clipped=1.0 2024-08-12 23:35:54,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1884290.0, ans=0.125 2024-08-12 23:36:14,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1884390.0, ans=0.125 2024-08-12 23:36:16,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1884390.0, ans=0.1 2024-08-12 23:36:28,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.72 vs. limit=15.0 2024-08-12 23:36:31,090 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.87 vs. limit=15.0 2024-08-12 23:36:35,580 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 50, loss[loss=0.1192, beats_loss=0.008285, ecapa_loss=0.0001833, whisper_loss=0.1091, over 21792.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01006, ecapa_loss=0.0001752, whisper_loss=0.09264, over 900187.45 frames. ], batch size: 86, lr: 4.58e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:36:45,813 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-12 23:37:08,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1884590.0, ans=0.1 2024-08-12 23:37:16,997 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 23:37:20,634 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 23:37:53,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1884690.0, ans=0.0 2024-08-12 23:37:55,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1884790.0, ans=0.125 2024-08-12 23:37:58,628 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 23:38:43,344 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 100, loss[loss=0.1122, beats_loss=0.01, ecapa_loss=0.0002179, whisper_loss=0.1, over 17489.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.009967, ecapa_loss=0.0001709, whisper_loss=0.09218, over 1525070.94 frames. ], batch size: 72, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:39:16,068 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-12 23:39:19,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1885090.0, ans=0.07 2024-08-12 23:39:26,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1885090.0, ans=0.1 2024-08-12 23:40:24,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.825e+01 3.064e+01 3.241e+01 4.540e+01, threshold=6.128e+01, percent-clipped=0.0 2024-08-12 23:41:11,559 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 150, loss[loss=0.1127, beats_loss=0.008614, ecapa_loss=0.0001991, whisper_loss=0.1021, over 19287.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01007, ecapa_loss=0.0001711, whisper_loss=0.09136, over 2033821.97 frames. ], batch size: 77, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:41:18,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1885490.0, ans=0.95 2024-08-12 23:41:22,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1885490.0, ans=0.125 2024-08-12 23:41:26,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1885490.0, ans=0.1 2024-08-12 23:41:36,108 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 23:42:02,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1885690.0, ans=0.125 2024-08-12 23:42:39,559 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 23:42:52,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1885790.0, ans=0.125 2024-08-12 23:43:16,167 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.20 vs. limit=22.5 2024-08-12 23:43:23,172 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 200, loss[loss=0.1012, beats_loss=0.008804, ecapa_loss=0.0001935, whisper_loss=0.0905, over 14015.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01031, ecapa_loss=0.0001724, whisper_loss=0.09104, over 2400072.29 frames. ], batch size: 55, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:43:26,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1885990.0, ans=0.1 2024-08-12 23:43:28,756 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 23:44:20,393 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 23:44:25,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1886190.0, ans=0.2 2024-08-12 23:44:40,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.587e+01 2.870e+01 3.355e+01 1.552e+02, threshold=5.741e+01, percent-clipped=1.0 2024-08-12 23:44:51,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1886290.0, ans=0.0 2024-08-12 23:44:53,941 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 23:45:09,077 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 34 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 23:45:26,541 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 250, loss[loss=0.1172, beats_loss=0.008151, ecapa_loss=0.0001845, whisper_loss=0.1072, over 16289.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01037, ecapa_loss=0.0001721, whisper_loss=0.09171, over 2697023.18 frames. ], batch size: 62, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:45:47,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1886490.0, ans=0.125 2024-08-12 23:46:08,796 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 23:46:16,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=1886690.0, ans=0.1 2024-08-12 23:47:05,980 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.86 vs. limit=12.0 2024-08-12 23:47:16,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1886890.0, ans=0.09899494936611666 2024-08-12 23:47:22,870 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 23:47:26,637 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.11 vs. limit=10.0 2024-08-12 23:47:27,401 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 300, loss[loss=0.1049, beats_loss=0.01069, ecapa_loss=0.0002008, whisper_loss=0.09221, over 16281.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01054, ecapa_loss=0.0001719, whisper_loss=0.09136, over 2948320.72 frames. ], batch size: 65, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:48:16,137 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.332e+01 2.692e+01 3.047e+01 7.964e+01, threshold=5.385e+01, percent-clipped=1.0 2024-08-12 23:48:18,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1887290.0, ans=0.0 2024-08-12 23:48:22,712 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 23:48:27,466 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 23:48:31,568 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 23:48:44,163 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 350, loss[loss=0.1044, beats_loss=0.01136, ecapa_loss=0.0001818, whisper_loss=0.09118, over 17868.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01075, ecapa_loss=0.0001701, whisper_loss=0.09003, over 3134697.23 frames. ], batch size: 74, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:48:46,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1887490.0, ans=0.0 2024-08-12 23:48:47,849 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 16 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 23:49:07,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1887590.0, ans=0.125 2024-08-12 23:49:11,860 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 26 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-12 23:49:19,486 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 23:49:19,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1887690.0, ans=0.125 2024-08-12 23:49:21,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1887690.0, ans=0.125 2024-08-12 23:49:31,287 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 23:49:33,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1887790.0, ans=0.0 2024-08-12 23:49:35,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1887790.0, ans=0.09899494936611666 2024-08-12 23:49:35,280 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.976e+05 2024-08-12 23:49:36,257 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 6 from Vox, 29 fro AS 2024-08-12 23:49:47,720 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 23:49:59,316 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 400, loss[loss=0.07125, beats_loss=0.01245, ecapa_loss=0.000201, whisper_loss=0.05678, over 21358.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01077, ecapa_loss=0.0001696, whisper_loss=0.08991, over 3292152.29 frames. ], batch size: 92, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:49:59,491 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 15 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 23:50:01,113 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 23:50:08,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1887990.0, ans=0.0 2024-08-12 23:50:15,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1888090.0, ans=0.125 2024-08-12 23:50:31,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1888190.0, ans=0.125 2024-08-12 23:50:33,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1888190.0, ans=0.2 2024-08-12 23:50:51,610 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.354e+01 2.624e+01 3.158e+01 4.755e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-12 23:51:01,026 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 18 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 23:51:11,667 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-12 23:51:14,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1888390.0, ans=0.125 2024-08-12 23:51:17,137 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 450, loss[loss=0.1067, beats_loss=0.008532, ecapa_loss=0.0001789, whisper_loss=0.09639, over 14202.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01076, ecapa_loss=0.0001711, whisper_loss=0.08891, over 3398248.39 frames. ], batch size: 55, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:51:17,475 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-12 23:51:17,887 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.16 vs. limit=15.0 2024-08-12 23:51:32,970 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 23:51:40,222 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 23:51:55,286 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-12 23:51:59,232 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.47 vs. limit=10.0 2024-08-12 23:52:09,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1888790.0, ans=0.125 2024-08-12 23:52:12,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2024-08-12 23:52:12,131 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-12 23:52:26,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1888890.0, ans=0.0 2024-08-12 23:52:31,395 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=12.0 2024-08-12 23:52:33,510 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 500, loss[loss=0.09643, beats_loss=0.009093, ecapa_loss=0.0001868, whisper_loss=0.08547, over 20204.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01071, ecapa_loss=0.0001692, whisper_loss=0.08906, over 3509371.66 frames. ], batch size: 82, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:52:48,093 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 23:52:50,485 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 23:52:53,331 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.65 vs. limit=10.0 2024-08-12 23:53:01,780 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2024-08-12 23:53:24,129 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.385e+01 2.695e+01 3.088e+01 5.680e+01, threshold=5.390e+01, percent-clipped=1.0 2024-08-12 23:53:26,742 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=12.0 2024-08-12 23:53:39,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1889390.0, ans=0.125 2024-08-12 23:53:40,415 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 28 from LS+wenet, 26 from Vox, 16 fro AS 2024-08-12 23:53:43,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2024-08-12 23:53:46,645 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 23:53:49,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-12 23:53:51,187 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 550, loss[loss=0.09499, beats_loss=0.01299, ecapa_loss=0.0001366, whisper_loss=0.08063, over 20182.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001682, whisper_loss=0.08967, over 3585378.56 frames. ], batch size: 79, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:53:52,594 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-12 23:54:11,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=1889590.0, ans=0.1 2024-08-12 23:54:17,649 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 23:54:40,727 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 23:54:54,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1889890.0, ans=0.2 2024-08-12 23:54:55,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1889890.0, ans=0.0 2024-08-12 23:55:06,855 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 600, loss[loss=0.09446, beats_loss=0.01006, ecapa_loss=0.0001818, whisper_loss=0.08258, over 21696.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01066, ecapa_loss=0.0001681, whisper_loss=0.0895, over 3611702.35 frames. ], batch size: 89, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:55:12,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1889990.0, ans=0.0 2024-08-12 23:55:13,816 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-12 23:55:25,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1890090.0, ans=0.2 2024-08-12 23:55:30,790 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-12 23:55:31,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1890090.0, ans=0.0 2024-08-12 23:55:52,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1890290.0, ans=0.125 2024-08-12 23:55:54,897 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.595e+01 2.472e+01 2.658e+01 3.015e+01 7.457e+01, threshold=5.315e+01, percent-clipped=1.0 2024-08-12 23:55:56,997 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 28 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-12 23:56:06,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-08-12 23:56:16,087 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 23:56:20,249 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 650, loss[loss=0.109, beats_loss=0.0104, ecapa_loss=0.00019, whisper_loss=0.09672, over 20738.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01057, ecapa_loss=0.0001672, whisper_loss=0.0911, over 3648441.97 frames. ], batch size: 81, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:56:26,424 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 23:56:32,216 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 23:56:32,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1890490.0, ans=0.1 2024-08-12 23:56:41,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1890590.0, ans=0.5 2024-08-12 23:56:51,626 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 23:57:10,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1890790.0, ans=0.0 2024-08-12 23:57:23,224 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.06 vs. limit=10.0 2024-08-12 23:57:24,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1890890.0, ans=0.95 2024-08-12 23:57:30,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1890890.0, ans=0.125 2024-08-12 23:57:36,220 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 700, loss[loss=0.08422, beats_loss=0.01157, ecapa_loss=0.0001434, whisper_loss=0.07122, over 16653.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01069, ecapa_loss=0.0001668, whisper_loss=0.09039, over 3672891.22 frames. ], batch size: 66, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:57:36,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1890990.0, ans=0.125 2024-08-12 23:57:47,831 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 23:58:04,396 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 11 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 23:58:07,880 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.39 vs. limit=22.5 2024-08-12 23:58:12,844 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 19 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 23:58:14,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1891190.0, ans=0.125 2024-08-12 23:58:16,200 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 23:58:19,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1891290.0, ans=0.0 2024-08-12 23:58:23,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-08-12 23:58:24,418 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.432e+01 2.727e+01 3.024e+01 4.665e+01, threshold=5.453e+01, percent-clipped=0.0 2024-08-12 23:58:27,788 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 31 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 23:58:49,407 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 750, loss[loss=0.1215, beats_loss=0.006849, ecapa_loss=0.0002094, whisper_loss=0.1125, over 19004.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.000166, whisper_loss=0.09045, over 3713378.91 frames. ], batch size: 79, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:58:54,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1891490.0, ans=0.2 2024-08-12 23:59:00,224 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2024-08-12 23:59:01,017 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 23:59:13,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1891590.0, ans=0.0 2024-08-12 23:59:14,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1891590.0, ans=0.125 2024-08-12 23:59:26,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1891690.0, ans=0.125 2024-08-12 23:59:37,162 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 23 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 23:59:38,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1891790.0, ans=0.0 2024-08-13 00:00:01,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1891890.0, ans=0.2 2024-08-13 00:00:03,830 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 800, loss[loss=0.1237, beats_loss=0.009059, ecapa_loss=0.0001601, whisper_loss=0.1131, over 16853.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001655, whisper_loss=0.09039, over 3733991.22 frames. ], batch size: 64, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:00:14,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1891990.0, ans=0.125 2024-08-13 00:00:18,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1892090.0, ans=0.125 2024-08-13 00:00:27,350 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2024-08-13 00:00:40,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1892190.0, ans=0.5 2024-08-13 00:00:50,738 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 00:00:54,443 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.057e+01 2.376e+01 2.556e+01 2.956e+01 7.880e+01, threshold=5.112e+01, percent-clipped=1.0 2024-08-13 00:00:59,559 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 17 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 00:01:01,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1892290.0, ans=0.125 2024-08-13 00:01:08,314 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 36 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 00:01:19,970 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 850, loss[loss=0.1117, beats_loss=0.01154, ecapa_loss=0.0001492, whisper_loss=0.09862, over 21141.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01074, ecapa_loss=0.0001652, whisper_loss=0.08996, over 3755471.96 frames. ], batch size: 83, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:01:20,245 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-13 00:01:45,669 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 00:01:47,071 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 32 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 00:01:47,585 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2024-08-13 00:01:52,412 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 00:01:58,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2024-08-13 00:02:09,884 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-13 00:02:32,008 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 900, loss[loss=0.0971, beats_loss=0.01266, ecapa_loss=0.000158, whisper_loss=0.08286, over 14706.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01079, ecapa_loss=0.0001644, whisper_loss=0.08989, over 3771760.34 frames. ], batch size: 57, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:02:34,942 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 00:02:45,602 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 00:02:48,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1893090.0, ans=0.0 2024-08-13 00:02:55,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1893090.0, ans=0.0 2024-08-13 00:03:01,556 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-13 00:03:17,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1893290.0, ans=0.125 2024-08-13 00:03:19,466 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.408e+01 2.662e+01 2.977e+01 4.425e+01, threshold=5.325e+01, percent-clipped=0.0 2024-08-13 00:03:25,281 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 00:03:41,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2024-08-13 00:03:41,756 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 00:03:44,361 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 950, loss[loss=0.1105, beats_loss=0.01057, ecapa_loss=0.0001496, whisper_loss=0.09839, over 20052.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001642, whisper_loss=0.09087, over 3790832.53 frames. ], batch size: 78, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:04:04,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1893590.0, ans=0.0 2024-08-13 00:04:05,699 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 21 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-13 00:04:11,232 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 00:04:33,793 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 9 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 00:04:48,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1893890.0, ans=0.125 2024-08-13 00:04:53,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1893890.0, ans=0.125 2024-08-13 00:04:59,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1893990.0, ans=0.125 2024-08-13 00:04:59,882 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1000, loss[loss=0.09754, beats_loss=0.008435, ecapa_loss=0.0001695, whisper_loss=0.08741, over 14092.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001641, whisper_loss=0.09071, over 3789237.88 frames. ], batch size: 56, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:05:04,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1893990.0, ans=0.125 2024-08-13 00:05:11,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1893990.0, ans=0.05 2024-08-13 00:05:20,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1894090.0, ans=0.125 2024-08-13 00:05:48,067 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.405e+01 2.688e+01 3.061e+01 4.317e+01, threshold=5.377e+01, percent-clipped=0.0 2024-08-13 00:05:51,499 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 00:05:53,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1894290.0, ans=0.1 2024-08-13 00:06:13,782 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1050, loss[loss=0.09399, beats_loss=0.009982, ecapa_loss=0.0001927, whisper_loss=0.08208, over 21590.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01082, ecapa_loss=0.0001629, whisper_loss=0.08979, over 3778053.17 frames. ], batch size: 90, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:06:22,573 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.36 vs. limit=15.0 2024-08-13 00:06:33,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1894590.0, ans=0.0 2024-08-13 00:06:49,100 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2024-08-13 00:07:08,158 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 00:07:24,746 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=15.0 2024-08-13 00:07:34,278 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1100, loss[loss=0.1037, beats_loss=0.0132, ecapa_loss=0.0001577, whisper_loss=0.08888, over 21873.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01074, ecapa_loss=0.0001642, whisper_loss=0.08997, over 3757965.14 frames. ], batch size: 89, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:07:47,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1894990.0, ans=0.0 2024-08-13 00:08:00,890 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.32 vs. limit=10.0 2024-08-13 00:08:08,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.81 vs. limit=10.0 2024-08-13 00:08:25,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.502e+01 2.869e+01 3.346e+01 6.186e+01, threshold=5.739e+01, percent-clipped=2.0 2024-08-13 00:08:35,080 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 00:08:49,813 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 33 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 00:08:51,181 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1150, loss[loss=0.1238, beats_loss=0.009724, ecapa_loss=0.0001812, whisper_loss=0.1123, over 21396.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01076, ecapa_loss=0.000163, whisper_loss=0.09111, over 3817127.30 frames. ], batch size: 85, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:09:04,646 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 00:09:09,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=1895590.0, ans=0.2 2024-08-13 00:09:20,859 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-13 00:09:22,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1895690.0, ans=0.125 2024-08-13 00:09:27,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1895690.0, ans=0.2 2024-08-13 00:09:31,657 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 00:09:55,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1895890.0, ans=0.125 2024-08-13 00:10:03,316 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:10:10,386 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1200, loss[loss=0.09008, beats_loss=0.01137, ecapa_loss=0.000166, whisper_loss=0.07706, over 15074.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.000163, whisper_loss=0.09073, over 3803037.20 frames. ], batch size: 59, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:10:15,050 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 25 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-13 00:10:23,914 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 00:10:27,704 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 00:10:29,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=22.5 2024-08-13 00:10:45,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2024-08-13 00:11:06,009 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.344e+01 2.617e+01 3.051e+01 6.950e+01, threshold=5.235e+01, percent-clipped=1.0 2024-08-13 00:11:09,471 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 00:11:11,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1896290.0, ans=0.125 2024-08-13 00:11:17,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1896390.0, ans=0.2 2024-08-13 00:11:29,268 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 00:11:31,735 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1250, loss[loss=0.09789, beats_loss=0.01234, ecapa_loss=0.0001591, whisper_loss=0.08396, over 15891.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01087, ecapa_loss=0.0001626, whisper_loss=0.0905, over 3797039.89 frames. ], batch size: 67, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:11:46,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1896590.0, ans=0.125 2024-08-13 00:12:08,088 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:12:22,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1896790.0, ans=0.0 2024-08-13 00:12:31,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1896790.0, ans=0.0 2024-08-13 00:12:34,970 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 00:12:40,985 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:12:50,295 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1300, loss[loss=0.09655, beats_loss=0.01332, ecapa_loss=0.0001521, whisper_loss=0.08171, over 22177.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0109, ecapa_loss=0.000163, whisper_loss=0.09022, over 3770201.03 frames. ], batch size: 91, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:13:29,580 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:13:37,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1897290.0, ans=0.0 2024-08-13 00:13:42,299 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 00:13:43,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.447e+01 2.732e+01 3.060e+01 1.003e+02, threshold=5.464e+01, percent-clipped=1.0 2024-08-13 00:13:59,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1897390.0, ans=0.125 2024-08-13 00:13:59,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1897390.0, ans=0.125 2024-08-13 00:14:12,434 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1350, loss[loss=0.1005, beats_loss=0.01164, ecapa_loss=0.000153, whisper_loss=0.0873, over 16819.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01088, ecapa_loss=0.0001628, whisper_loss=0.09023, over 3792137.30 frames. ], batch size: 65, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:14:26,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1897490.0, ans=0.0 2024-08-13 00:14:39,713 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 33 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-13 00:14:52,682 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-13 00:14:59,634 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 00:14:59,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1897690.0, ans=0.0 2024-08-13 00:15:01,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1897790.0, ans=0.125 2024-08-13 00:15:16,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1897890.0, ans=0.0 2024-08-13 00:15:16,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1897890.0, ans=0.125 2024-08-13 00:15:17,332 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 00:15:21,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1897890.0, ans=0.0 2024-08-13 00:15:32,929 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1400, loss[loss=0.1112, beats_loss=0.009847, ecapa_loss=0.0001623, whisper_loss=0.0997, over 17273.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0108, ecapa_loss=0.0001645, whisper_loss=0.08985, over 3768915.91 frames. ], batch size: 66, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:15:41,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1897990.0, ans=0.125 2024-08-13 00:16:09,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2024-08-13 00:16:20,474 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2024-08-13 00:16:25,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.414e+01 2.708e+01 3.137e+01 5.162e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-13 00:16:37,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1898390.0, ans=0.125 2024-08-13 00:16:40,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1898390.0, ans=0.0 2024-08-13 00:16:42,873 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-13 00:16:45,106 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-13 00:16:54,089 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1450, loss[loss=0.1158, beats_loss=0.007963, ecapa_loss=0.0001871, whisper_loss=0.1059, over 23251.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01077, ecapa_loss=0.0001633, whisper_loss=0.09011, over 3800773.64 frames. ], batch size: 92, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:17:26,060 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 14 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 00:17:36,283 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.53 vs. limit=15.0 2024-08-13 00:18:05,815 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.76 vs. limit=22.5 2024-08-13 00:18:43,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.53 vs. limit=10.0 2024-08-13 00:18:43,405 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1500, loss[loss=0.09277, beats_loss=0.01125, ecapa_loss=0.0001762, whisper_loss=0.07976, over 17523.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01079, ecapa_loss=0.0001633, whisper_loss=0.08982, over 3787946.47 frames. ], batch size: 69, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:18:53,515 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-13 00:18:57,387 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 30 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 00:19:18,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1899190.0, ans=0.05 2024-08-13 00:19:24,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1899190.0, ans=0.09899494936611666 2024-08-13 00:19:32,727 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2024-08-13 00:19:35,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.418e+01 2.688e+01 3.116e+01 4.487e+01, threshold=5.376e+01, percent-clipped=0.0 2024-08-13 00:19:53,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1899390.0, ans=0.0 2024-08-13 00:19:55,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1899390.0, ans=0.0 2024-08-13 00:20:02,726 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1550, loss[loss=0.1108, beats_loss=0.009988, ecapa_loss=0.000136, whisper_loss=0.09944, over 20025.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.0001629, whisper_loss=0.09089, over 3808314.06 frames. ], batch size: 77, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:20:15,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1899490.0, ans=0.125 2024-08-13 00:20:24,253 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2024-08-13 00:20:25,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1899590.0, ans=0.125 2024-08-13 00:20:31,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1899590.0, ans=0.0 2024-08-13 00:20:45,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1899690.0, ans=0.2 2024-08-13 00:20:51,998 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 00:20:55,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1899790.0, ans=0.0 2024-08-13 00:21:20,786 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1600, loss[loss=0.0887, beats_loss=0.01157, ecapa_loss=0.0001526, whisper_loss=0.0756, over 20263.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01077, ecapa_loss=0.0001628, whisper_loss=0.09009, over 3822398.29 frames. ], batch size: 79, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:21:24,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1899990.0, ans=0.125 2024-08-13 00:21:45,941 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 00:21:47,456 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 32 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 00:21:59,911 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 00:22:01,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1900190.0, ans=0.09899494936611666 2024-08-13 00:22:06,254 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-13 00:22:12,565 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.583e+01 2.856e+01 3.340e+01 1.108e+02, threshold=5.712e+01, percent-clipped=2.0 2024-08-13 00:22:13,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1900290.0, ans=0.125 2024-08-13 00:22:21,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1900290.0, ans=0.125 2024-08-13 00:22:27,019 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-13 00:22:31,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1900390.0, ans=0.1 2024-08-13 00:22:38,250 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1650, loss[loss=0.1019, beats_loss=0.009699, ecapa_loss=0.0001839, whisper_loss=0.09038, over 23266.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001632, whisper_loss=0.09084, over 3856469.95 frames. ], batch size: 91, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:22:50,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1900490.0, ans=0.0 2024-08-13 00:22:59,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1900590.0, ans=0.0 2024-08-13 00:23:07,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1900690.0, ans=0.125 2024-08-13 00:23:29,089 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 31 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 00:23:30,847 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 00:23:34,963 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-13 00:23:39,898 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 15 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 00:23:40,289 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:23:53,346 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1700, loss[loss=0.09613, beats_loss=0.0122, ecapa_loss=0.0001477, whisper_loss=0.08246, over 19185.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001643, whisper_loss=0.09079, over 3833196.99 frames. ], batch size: 78, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:23:57,535 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 14 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 00:24:10,058 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 00:24:11,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1901090.0, ans=0.125 2024-08-13 00:24:14,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1901090.0, ans=0.1 2024-08-13 00:24:14,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1901090.0, ans=0.125 2024-08-13 00:24:31,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1901190.0, ans=0.0 2024-08-13 00:24:35,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1901190.0, ans=0.125 2024-08-13 00:24:42,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.360e+01 2.688e+01 2.973e+01 4.042e+01, threshold=5.375e+01, percent-clipped=0.0 2024-08-13 00:24:54,886 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 00:24:57,586 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-13 00:25:02,893 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.142e+00 2024-08-13 00:25:07,774 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1750, loss[loss=0.07104, beats_loss=0.01103, ecapa_loss=0.0001426, whisper_loss=0.05859, over 16153.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001644, whisper_loss=0.0903, over 3859341.82 frames. ], batch size: 62, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:25:11,411 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.31 vs. limit=15.0 2024-08-13 00:25:28,694 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=21.32 vs. limit=22.5 2024-08-13 00:25:52,374 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 00:25:52,799 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2024-08-13 00:26:12,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1901890.0, ans=0.125 2024-08-13 00:26:15,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.96 vs. limit=10.0 2024-08-13 00:26:20,506 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1800, loss[loss=0.0969, beats_loss=0.01312, ecapa_loss=0.000142, whisper_loss=0.08236, over 22824.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01073, ecapa_loss=0.0001639, whisper_loss=0.09004, over 3829125.40 frames. ], batch size: 94, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:26:27,426 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 00:26:30,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1901990.0, ans=0.2 2024-08-13 00:26:37,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1902090.0, ans=0.04949747468305833 2024-08-13 00:26:48,860 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 00:26:52,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1902190.0, ans=0.125 2024-08-13 00:26:53,450 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 00:27:02,815 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 00:27:10,013 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 22 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-13 00:27:12,800 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.455e+01 2.703e+01 3.083e+01 4.143e+01, threshold=5.406e+01, percent-clipped=0.0 2024-08-13 00:27:22,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1902390.0, ans=0.1 2024-08-13 00:27:27,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1902390.0, ans=0.2 2024-08-13 00:27:29,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1902390.0, ans=0.125 2024-08-13 00:27:40,224 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1850, loss[loss=0.1086, beats_loss=0.007834, ecapa_loss=0.0001937, whisper_loss=0.09888, over 14669.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01069, ecapa_loss=0.000165, whisper_loss=0.09008, over 3796139.85 frames. ], batch size: 59, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:28:03,965 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 00:28:08,497 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 00:28:13,312 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 00:28:17,171 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 30 from Vox, 24 fro AS 2024-08-13 00:28:20,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1902690.0, ans=0.2 2024-08-13 00:29:00,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1902990.0, ans=0.0 2024-08-13 00:29:00,730 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.73 vs. limit=12.0 2024-08-13 00:29:00,956 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1900, loss[loss=0.1177, beats_loss=0.008017, ecapa_loss=0.0001745, whisper_loss=0.1079, over 16228.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01072, ecapa_loss=0.0001657, whisper_loss=0.09053, over 3803090.58 frames. ], batch size: 60, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:29:12,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1902990.0, ans=0.025 2024-08-13 00:29:16,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1903090.0, ans=0.125 2024-08-13 00:29:29,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1903090.0, ans=0.125 2024-08-13 00:29:31,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1903190.0, ans=0.1 2024-08-13 00:29:53,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.471e+01 2.746e+01 3.040e+01 5.075e+01, threshold=5.492e+01, percent-clipped=0.0 2024-08-13 00:29:54,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1903290.0, ans=0.125 2024-08-13 00:29:58,812 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-13 00:30:03,720 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.22 vs. limit=15.0 2024-08-13 00:30:08,029 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2024-08-13 00:30:20,530 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 1950, loss[loss=0.1015, beats_loss=0.01133, ecapa_loss=0.0001253, whisper_loss=0.08894, over 19487.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01081, ecapa_loss=0.0001654, whisper_loss=0.08987, over 3835479.74 frames. ], batch size: 73, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:30:22,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1903490.0, ans=0.0 2024-08-13 00:30:44,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1903590.0, ans=0.2 2024-08-13 00:30:52,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1903690.0, ans=0.125 2024-08-13 00:30:58,354 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 20 from LS+wenet, 24 from Vox, 50 fro AS 2024-08-13 00:30:58,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1903690.0, ans=0.05 2024-08-13 00:31:05,734 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 00:31:10,482 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2024-08-13 00:31:14,139 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-13 00:31:25,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1903890.0, ans=0.125 2024-08-13 00:31:30,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1903890.0, ans=0.125 2024-08-13 00:31:37,936 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-13 00:31:38,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1903990.0, ans=0.125 2024-08-13 00:31:39,089 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2000, loss[loss=0.1015, beats_loss=0.009187, ecapa_loss=0.000218, whisper_loss=0.09016, over 15868.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01085, ecapa_loss=0.0001658, whisper_loss=0.08944, over 3848575.95 frames. ], batch size: 64, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:32:01,653 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.74 vs. limit=5.0 2024-08-13 00:32:24,925 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2024-08-13 00:32:27,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1904290.0, ans=0.125 2024-08-13 00:32:30,075 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.391e+01 2.734e+01 3.144e+01 4.841e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-13 00:32:45,682 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 00:32:45,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1904390.0, ans=0.07 2024-08-13 00:32:47,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1904390.0, ans=0.0 2024-08-13 00:32:50,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1904390.0, ans=0.0 2024-08-13 00:32:50,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1904390.0, ans=0.125 2024-08-13 00:32:56,036 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2050, loss[loss=0.0991, beats_loss=0.01304, ecapa_loss=0.0001298, whisper_loss=0.08477, over 23670.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01091, ecapa_loss=0.0001656, whisper_loss=0.08895, over 3824513.24 frames. ], batch size: 91, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:33:07,990 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 00:33:11,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1904590.0, ans=0.125 2024-08-13 00:33:23,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1904590.0, ans=0.2 2024-08-13 00:33:31,628 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 00:33:35,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1904690.0, ans=0.0 2024-08-13 00:33:42,685 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 00:34:00,913 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 20 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 00:34:12,440 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2100, loss[loss=0.1132, beats_loss=0.008483, ecapa_loss=0.0002033, whisper_loss=0.1027, over 15044.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01099, ecapa_loss=0.0001633, whisper_loss=0.08862, over 3816671.85 frames. ], batch size: 63, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:34:18,397 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=12.0 2024-08-13 00:34:27,500 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 00:34:28,084 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2024-08-13 00:34:42,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1905190.0, ans=0.2 2024-08-13 00:35:00,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-08-13 00:35:03,408 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.317e+01 2.588e+01 2.864e+01 4.791e+01, threshold=5.176e+01, percent-clipped=0.0 2024-08-13 00:35:04,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1905290.0, ans=0.0 2024-08-13 00:35:11,963 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 29 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 00:35:15,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1905390.0, ans=0.0 2024-08-13 00:35:29,572 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2150, loss[loss=0.0852, beats_loss=0.01308, ecapa_loss=0.000144, whisper_loss=0.07068, over 17995.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01099, ecapa_loss=0.0001626, whisper_loss=0.08895, over 3802317.39 frames. ], batch size: 72, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:35:35,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1905490.0, ans=0.125 2024-08-13 00:36:03,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1905690.0, ans=0.0 2024-08-13 00:36:06,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.86 vs. limit=22.5 2024-08-13 00:36:15,555 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 00:36:22,177 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 28 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 00:36:28,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2024-08-13 00:36:39,763 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 00:36:45,413 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 00:36:51,230 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2200, loss[loss=0.1278, beats_loss=0.009265, ecapa_loss=0.0002022, whisper_loss=0.1165, over 20233.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01099, ecapa_loss=0.0001626, whisper_loss=0.08937, over 3782341.71 frames. ], batch size: 78, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:36:52,119 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 00:36:59,562 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2024-08-13 00:37:05,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1905990.0, ans=0.5 2024-08-13 00:37:10,426 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 18 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 00:37:32,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-08-13 00:37:33,589 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 31 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 00:37:43,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1906290.0, ans=0.125 2024-08-13 00:37:45,674 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.358e+01 2.742e+01 3.274e+01 9.057e+01, threshold=5.483e+01, percent-clipped=3.0 2024-08-13 00:38:13,415 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2250, loss[loss=0.1161, beats_loss=0.01002, ecapa_loss=0.0001748, whisper_loss=0.1043, over 23302.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01104, ecapa_loss=0.0001637, whisper_loss=0.09015, over 3824982.37 frames. ], batch size: 91, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:38:14,095 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 00:38:32,243 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 00:39:03,674 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 31 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 00:39:22,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1906890.0, ans=0.1 2024-08-13 00:39:37,964 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2300, loss[loss=0.1012, beats_loss=0.01014, ecapa_loss=0.0001649, whisper_loss=0.08942, over 20654.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01099, ecapa_loss=0.0001656, whisper_loss=0.09105, over 3846529.65 frames. ], batch size: 81, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:40:15,558 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.77 vs. limit=15.0 2024-08-13 00:40:25,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1907190.0, ans=0.125 2024-08-13 00:40:27,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1907290.0, ans=0.0 2024-08-13 00:40:28,348 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 00:40:32,898 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.474e+01 2.795e+01 3.232e+01 6.818e+01, threshold=5.590e+01, percent-clipped=1.0 2024-08-13 00:40:33,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1907290.0, ans=0.035 2024-08-13 00:40:41,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1907290.0, ans=0.2 2024-08-13 00:40:47,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1907390.0, ans=0.0 2024-08-13 00:40:50,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1907390.0, ans=0.0 2024-08-13 00:40:51,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1907390.0, ans=0.04949747468305833 2024-08-13 00:40:53,441 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 00:40:53,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1907390.0, ans=0.0 2024-08-13 00:40:56,783 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 00:41:00,116 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2350, loss[loss=0.1142, beats_loss=0.01011, ecapa_loss=0.0001575, whisper_loss=0.1025, over 22099.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01095, ecapa_loss=0.0001661, whisper_loss=0.09101, over 3841468.76 frames. ], batch size: 86, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:41:09,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1907490.0, ans=0.05 2024-08-13 00:41:30,745 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.18 vs. limit=22.5 2024-08-13 00:41:40,876 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 32 from LS+wenet, 14 from Vox, 12 fro AS 2024-08-13 00:41:45,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1907690.0, ans=0.0 2024-08-13 00:41:48,889 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 00:41:49,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1907790.0, ans=0.0 2024-08-13 00:42:03,199 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-13 00:42:03,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1907790.0, ans=0.09899494936611666 2024-08-13 00:42:04,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1907790.0, ans=0.125 2024-08-13 00:42:06,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1907890.0, ans=0.1 2024-08-13 00:42:22,829 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2400, loss[loss=0.09695, beats_loss=0.00941, ecapa_loss=0.000129, whisper_loss=0.08625, over 15760.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0109, ecapa_loss=0.0001672, whisper_loss=0.09124, over 3830248.36 frames. ], batch size: 55, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:42:54,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1908190.0, ans=0.025 2024-08-13 00:43:16,954 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.472e+01 2.673e+01 3.015e+01 1.435e+02, threshold=5.346e+01, percent-clipped=1.0 2024-08-13 00:43:19,011 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 00:43:25,596 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 00:43:37,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1908390.0, ans=0.0 2024-08-13 00:43:39,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1908390.0, ans=0.125 2024-08-13 00:43:43,639 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 00:43:45,250 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2450, loss[loss=0.1082, beats_loss=0.01002, ecapa_loss=0.0002425, whisper_loss=0.09579, over 14474.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01088, ecapa_loss=0.0001668, whisper_loss=0.09189, over 3884468.72 frames. ], batch size: 61, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:43:49,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1908490.0, ans=0.0 2024-08-13 00:43:53,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1908490.0, ans=0.04949747468305833 2024-08-13 00:44:16,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1908690.0, ans=0.125 2024-08-13 00:44:22,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1908690.0, ans=0.0 2024-08-13 00:44:43,487 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 00:44:56,865 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 00:44:57,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1908890.0, ans=0.125 2024-08-13 00:45:04,478 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 25 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-13 00:45:06,536 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2500, loss[loss=0.1265, beats_loss=0.008959, ecapa_loss=0.0001926, whisper_loss=0.1156, over 14909.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01079, ecapa_loss=0.0001678, whisper_loss=0.09192, over 3897523.32 frames. ], batch size: 59, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:45:39,724 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 00:45:43,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1909190.0, ans=0.1 2024-08-13 00:45:45,536 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-08-13 00:45:47,970 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 00:46:01,284 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.554e+01 2.851e+01 3.287e+01 4.773e+01, threshold=5.702e+01, percent-clipped=0.0 2024-08-13 00:46:04,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-08-13 00:46:05,121 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.83 vs. limit=5.0 2024-08-13 00:46:12,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1909290.0, ans=0.0 2024-08-13 00:46:19,317 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 00:46:24,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1909390.0, ans=0.2 2024-08-13 00:46:26,291 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-13 00:46:29,674 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-13 00:46:30,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1909490.0, ans=0.125 2024-08-13 00:46:31,422 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2550, loss[loss=0.09888, beats_loss=0.0106, ecapa_loss=0.0001781, whisper_loss=0.08649, over 21249.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0108, ecapa_loss=0.0001673, whisper_loss=0.09194, over 3888457.22 frames. ], batch size: 88, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:46:41,106 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-13 00:46:55,811 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 00:47:20,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1909790.0, ans=0.125 2024-08-13 00:47:29,883 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-13 00:47:32,840 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 00:47:34,826 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 00:47:47,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1909890.0, ans=0.125 2024-08-13 00:47:53,749 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2600, loss[loss=0.1128, beats_loss=0.009575, ecapa_loss=0.0001601, whisper_loss=0.1016, over 15235.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01079, ecapa_loss=0.000168, whisper_loss=0.09194, over 3881284.96 frames. ], batch size: 60, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:47:54,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1909990.0, ans=0.125 2024-08-13 00:47:55,566 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 00:48:00,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1909990.0, ans=0.125 2024-08-13 00:48:09,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1909990.0, ans=0.125 2024-08-13 00:48:22,257 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 00:48:24,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1910090.0, ans=0.0 2024-08-13 00:48:25,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1910090.0, ans=0.125 2024-08-13 00:48:52,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.514e+01 2.741e+01 3.048e+01 4.490e+01, threshold=5.482e+01, percent-clipped=0.0 2024-08-13 00:49:19,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1910390.0, ans=0.1 2024-08-13 00:49:21,799 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2650, loss[loss=0.08808, beats_loss=0.01361, ecapa_loss=0.000146, whisper_loss=0.073, over 18433.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01077, ecapa_loss=0.0001685, whisper_loss=0.09158, over 3879947.04 frames. ], batch size: 74, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:49:38,030 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-13 00:49:42,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-08-13 00:49:48,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1910590.0, ans=0.0 2024-08-13 00:49:53,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1910690.0, ans=0.035 2024-08-13 00:50:27,361 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 13 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 00:50:38,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1910890.0, ans=0.125 2024-08-13 00:50:39,672 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 28 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-13 00:50:43,679 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2700, loss[loss=0.1006, beats_loss=0.01392, ecapa_loss=0.0001808, whisper_loss=0.08482, over 22440.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01079, ecapa_loss=0.0001688, whisper_loss=0.09114, over 3855827.53 frames. ], batch size: 92, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:50:48,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1910990.0, ans=0.0 2024-08-13 00:50:55,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1910990.0, ans=0.0 2024-08-13 00:51:01,186 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 00:51:03,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1911090.0, ans=0.2 2024-08-13 00:51:03,645 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.841e-01 2024-08-13 00:51:05,030 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-13 00:51:08,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1911090.0, ans=0.125 2024-08-13 00:51:13,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1911090.0, ans=0.1 2024-08-13 00:51:20,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1911190.0, ans=0.0 2024-08-13 00:51:38,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.492e+01 2.764e+01 3.227e+01 2.218e+02, threshold=5.527e+01, percent-clipped=1.0 2024-08-13 00:51:58,547 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 00:52:06,527 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2750, loss[loss=0.08862, beats_loss=0.0117, ecapa_loss=0.0001678, whisper_loss=0.07525, over 18188.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01092, ecapa_loss=0.0001679, whisper_loss=0.09052, over 3837263.78 frames. ], batch size: 74, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:52:07,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1911490.0, ans=0.125 2024-08-13 00:52:10,537 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:52:12,943 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 00:52:19,707 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=22.5 2024-08-13 00:52:20,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1911490.0, ans=0.1 2024-08-13 00:52:26,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1911590.0, ans=0.0 2024-08-13 00:52:35,472 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 00:52:40,530 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 00:52:43,552 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 00:52:58,442 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.74 vs. limit=15.0 2024-08-13 00:53:16,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1911890.0, ans=10.0 2024-08-13 00:53:31,008 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2800, loss[loss=0.08429, beats_loss=0.01446, ecapa_loss=0.0001717, whisper_loss=0.06811, over 21996.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01092, ecapa_loss=0.0001682, whisper_loss=0.09081, over 3855976.19 frames. ], batch size: 94, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:53:36,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1911990.0, ans=0.125 2024-08-13 00:53:39,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1911990.0, ans=0.125 2024-08-13 00:53:57,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1912090.0, ans=0.2 2024-08-13 00:53:57,154 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.215e-01 2024-08-13 00:54:01,289 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-13 00:54:02,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1912090.0, ans=0.125 2024-08-13 00:54:02,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1912090.0, ans=0.125 2024-08-13 00:54:05,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1912190.0, ans=0.125 2024-08-13 00:54:23,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1912290.0, ans=0.125 2024-08-13 00:54:26,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1912290.0, ans=0.2 2024-08-13 00:54:28,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.473e+01 2.733e+01 3.017e+01 4.460e+01, threshold=5.467e+01, percent-clipped=0.0 2024-08-13 00:54:30,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1912290.0, ans=0.125 2024-08-13 00:54:37,621 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.72 vs. limit=22.5 2024-08-13 00:54:40,696 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 00:54:53,107 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.548e-01 2024-08-13 00:54:56,812 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.32 vs. limit=22.5 2024-08-13 00:54:57,329 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2850, loss[loss=0.09559, beats_loss=0.01291, ecapa_loss=0.0001335, whisper_loss=0.08134, over 19008.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01095, ecapa_loss=0.0001676, whisper_loss=0.09104, over 3888427.84 frames. ], batch size: 73, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:55:03,473 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 00:55:05,594 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-13 00:55:09,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1912490.0, ans=0.125 2024-08-13 00:55:14,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1912590.0, ans=0.2 2024-08-13 00:55:55,373 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.16 vs. limit=15.0 2024-08-13 00:56:11,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1912890.0, ans=0.2 2024-08-13 00:56:15,254 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 00:56:20,479 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2900, loss[loss=0.07529, beats_loss=0.01443, ecapa_loss=0.0001735, whisper_loss=0.05913, over 21578.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01099, ecapa_loss=0.0001684, whisper_loss=0.09103, over 3897874.77 frames. ], batch size: 92, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:56:21,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1912990.0, ans=0.0 2024-08-13 00:56:24,934 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 00:56:43,991 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 00:56:57,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1913190.0, ans=0.125 2024-08-13 00:57:18,751 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.481e+01 2.818e+01 3.186e+01 4.138e+01, threshold=5.637e+01, percent-clipped=0.0 2024-08-13 00:57:24,074 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 23 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 00:57:34,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1913390.0, ans=0.125 2024-08-13 00:57:45,362 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 2950, loss[loss=0.09204, beats_loss=0.01281, ecapa_loss=0.0001634, whisper_loss=0.07759, over 20219.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01092, ecapa_loss=0.0001677, whisper_loss=0.09182, over 3905832.98 frames. ], batch size: 83, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:57:48,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1913490.0, ans=0.125 2024-08-13 00:57:48,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1913490.0, ans=0.125 2024-08-13 00:57:57,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1913490.0, ans=10.0 2024-08-13 00:58:06,020 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 00:58:20,405 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-08-13 00:58:43,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1913790.0, ans=0.2 2024-08-13 00:58:52,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1913890.0, ans=0.1 2024-08-13 00:58:52,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1913890.0, ans=0.125 2024-08-13 00:59:02,547 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3000, loss[loss=0.1188, beats_loss=0.008155, ecapa_loss=0.0001918, whisper_loss=0.1088, over 15686.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01098, ecapa_loss=0.0001665, whisper_loss=0.09171, over 3916308.16 frames. ], batch size: 60, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:59:02,548 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 00:59:43,480 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.0005759, whisper_loss=0.2486, over 922467.00 frames. 2024-08-13 01:00:02,066 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on SV_voxceleb1: loss=0.004628, beats_loss=0, ecapa_loss=0.0004628, whisper_loss=0, over 939242.00 frames. 2024-08-13 01:01:59,091 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7781, 1.4537, 1.6825, 1.3475, 1.1094, 1.6222, 2.0878, 1.1273], device='cuda:1') 2024-08-13 01:01:59,751 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on AT_audioset: loss=0.02407, beats_loss=0.02407, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 01:01:59,755 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-13 01:02:07,593 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 32 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 01:02:12,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1913990.0, ans=0.0 2024-08-13 01:02:13,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2024-08-13 01:02:25,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1914090.0, ans=0.125 2024-08-13 01:02:32,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1914190.0, ans=0.04949747468305833 2024-08-13 01:02:33,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1914190.0, ans=0.0 2024-08-13 01:02:41,049 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2024-08-13 01:02:41,073 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.87 vs. limit=22.5 2024-08-13 01:02:50,256 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.523e+01 2.729e+01 3.233e+01 5.051e+01, threshold=5.458e+01, percent-clipped=0.0 2024-08-13 01:02:52,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1914290.0, ans=0.1 2024-08-13 01:03:07,020 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 01:03:08,463 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 01:03:16,478 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3050, loss[loss=0.0933, beats_loss=0.009684, ecapa_loss=0.0002012, whisper_loss=0.0816, over 14579.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01093, ecapa_loss=0.0001676, whisper_loss=0.09167, over 3911660.79 frames. ], batch size: 60, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:03:24,233 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-08-13 01:03:30,650 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2024-08-13 01:03:53,881 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 01:03:57,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1914690.0, ans=0.125 2024-08-13 01:04:00,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1914790.0, ans=0.125 2024-08-13 01:04:07,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1914790.0, ans=0.1 2024-08-13 01:04:13,230 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-13 01:04:25,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1914890.0, ans=0.2 2024-08-13 01:04:28,298 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=12.0 2024-08-13 01:04:30,254 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3100, loss[loss=0.07315, beats_loss=0.01275, ecapa_loss=0.0001817, whisper_loss=0.05858, over 14450.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01089, ecapa_loss=0.0001686, whisper_loss=0.09191, over 3879298.19 frames. ], batch size: 60, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:04:36,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1914990.0, ans=0.125 2024-08-13 01:04:47,263 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 01:04:47,981 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.90 vs. limit=12.0 2024-08-13 01:04:59,938 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 26 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-13 01:05:09,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1915190.0, ans=0.0 2024-08-13 01:05:17,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1915290.0, ans=0.125 2024-08-13 01:05:18,707 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.430e+01 2.726e+01 3.080e+01 5.396e+01, threshold=5.451e+01, percent-clipped=0.0 2024-08-13 01:05:42,621 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.14 vs. limit=15.0 2024-08-13 01:05:44,506 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3150, loss[loss=0.08938, beats_loss=0.01392, ecapa_loss=0.0001599, whisper_loss=0.07386, over 16002.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01096, ecapa_loss=0.000169, whisper_loss=0.09132, over 3866731.67 frames. ], batch size: 66, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:05:49,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1915490.0, ans=0.125 2024-08-13 01:06:08,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1915590.0, ans=0.125 2024-08-13 01:06:25,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1915690.0, ans=0.2 2024-08-13 01:06:50,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-08-13 01:06:52,706 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 36 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 01:06:54,008 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 01:06:57,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1915990.0, ans=0.125 2024-08-13 01:06:58,195 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3200, loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001536, whisper_loss=0.09036, over 17035.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01099, ecapa_loss=0.0001686, whisper_loss=0.0912, over 3873067.59 frames. ], batch size: 65, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:07:07,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1915990.0, ans=0.125 2024-08-13 01:07:17,650 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 01:07:17,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1916090.0, ans=0.125 2024-08-13 01:07:18,223 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-08-13 01:07:25,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1916090.0, ans=0.1 2024-08-13 01:07:45,885 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.363e+01 2.691e+01 2.946e+01 6.786e+01, threshold=5.382e+01, percent-clipped=1.0 2024-08-13 01:08:05,990 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.74 vs. limit=22.5 2024-08-13 01:08:10,855 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3250, loss[loss=0.1061, beats_loss=0.01067, ecapa_loss=0.0001842, whisper_loss=0.09358, over 22756.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01101, ecapa_loss=0.0001691, whisper_loss=0.09127, over 3866807.35 frames. ], batch size: 94, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:08:21,049 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-13 01:08:22,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1916490.0, ans=0.1 2024-08-13 01:08:31,461 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 01:08:33,752 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2024-08-13 01:08:35,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1916590.0, ans=0.125 2024-08-13 01:08:38,160 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 01:08:39,487 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 01:08:46,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-08-13 01:08:47,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1916690.0, ans=0.0 2024-08-13 01:08:53,040 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 01:08:55,253 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-13 01:08:56,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1916790.0, ans=0.2 2024-08-13 01:09:21,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1916890.0, ans=0.0 2024-08-13 01:09:21,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1916890.0, ans=0.125 2024-08-13 01:09:25,379 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3300, loss[loss=0.08781, beats_loss=0.01297, ecapa_loss=0.0001852, whisper_loss=0.07299, over 18912.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01111, ecapa_loss=0.0001669, whisper_loss=0.09059, over 3852343.32 frames. ], batch size: 81, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:09:45,261 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 01:09:48,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1917090.0, ans=10.0 2024-08-13 01:09:50,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1917090.0, ans=0.0 2024-08-13 01:09:55,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1917190.0, ans=0.1 2024-08-13 01:09:57,474 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.93 vs. limit=22.5 2024-08-13 01:10:04,362 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 01:10:09,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1917290.0, ans=0.125 2024-08-13 01:10:11,670 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.21 vs. limit=15.0 2024-08-13 01:10:13,644 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.401e+01 2.681e+01 3.036e+01 4.663e+01, threshold=5.362e+01, percent-clipped=0.0 2024-08-13 01:10:14,552 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.59 vs. limit=15.0 2024-08-13 01:10:36,638 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2024-08-13 01:10:38,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=15.0 2024-08-13 01:10:38,824 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3350, loss[loss=0.1257, beats_loss=0.009181, ecapa_loss=0.0001879, whisper_loss=0.1146, over 21916.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01096, ecapa_loss=0.0001681, whisper_loss=0.09164, over 3867276.81 frames. ], batch size: 89, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:10:51,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1917490.0, ans=0.1 2024-08-13 01:10:56,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1917590.0, ans=0.025 2024-08-13 01:11:12,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1917690.0, ans=0.1 2024-08-13 01:11:13,465 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 01:11:15,466 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 01:11:28,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1917790.0, ans=0.125 2024-08-13 01:11:46,197 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.59 vs. limit=10.0 2024-08-13 01:11:55,795 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3400, loss[loss=0.1027, beats_loss=0.01026, ecapa_loss=0.0001695, whisper_loss=0.09076, over 20970.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01104, ecapa_loss=0.000167, whisper_loss=0.09076, over 3876387.78 frames. ], batch size: 84, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:12:00,880 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-13 01:12:10,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1918090.0, ans=0.07 2024-08-13 01:12:10,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.56 vs. limit=22.5 2024-08-13 01:12:14,773 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 01:12:24,568 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.63 vs. limit=22.5 2024-08-13 01:12:25,011 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 01:12:34,119 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 01:12:41,649 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 01:12:45,601 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.443e+01 2.703e+01 3.105e+01 5.409e+01, threshold=5.407e+01, percent-clipped=1.0 2024-08-13 01:12:45,898 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 01:12:50,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1918290.0, ans=0.05 2024-08-13 01:12:57,139 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 22 from LS+wenet, 7 from Vox, 30 fro AS 2024-08-13 01:13:03,236 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 01:13:10,379 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3450, loss[loss=0.08913, beats_loss=0.01403, ecapa_loss=0.0001478, whisper_loss=0.07362, over 22787.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01106, ecapa_loss=0.0001678, whisper_loss=0.09076, over 3902455.42 frames. ], batch size: 92, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:13:19,420 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 16 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 01:13:39,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1918690.0, ans=0.0 2024-08-13 01:14:20,061 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3500, loss[loss=0.1289, beats_loss=0.007925, ecapa_loss=0.0001788, whisper_loss=0.1192, over 23403.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01107, ecapa_loss=0.0001695, whisper_loss=0.09098, over 3904924.87 frames. ], batch size: 88, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:14:26,818 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 27 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 01:14:37,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2024-08-13 01:14:38,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1919090.0, ans=0.0 2024-08-13 01:14:46,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1919190.0, ans=0.125 2024-08-13 01:15:00,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1919290.0, ans=0.125 2024-08-13 01:15:03,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1919290.0, ans=0.0 2024-08-13 01:15:05,867 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.466e+01 2.782e+01 3.112e+01 6.873e+01, threshold=5.565e+01, percent-clipped=2.0 2024-08-13 01:15:10,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1919290.0, ans=0.1 2024-08-13 01:15:13,501 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2024-08-13 01:15:17,039 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 01:15:18,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1919390.0, ans=0.0 2024-08-13 01:15:21,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1919390.0, ans=0.09899494936611666 2024-08-13 01:15:27,121 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-13 01:15:29,698 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3550, loss[loss=0.1023, beats_loss=0.01016, ecapa_loss=0.0002145, whisper_loss=0.08997, over 15640.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01108, ecapa_loss=0.0001685, whisper_loss=0.09109, over 3922358.10 frames. ], batch size: 60, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:15:29,968 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 01:15:31,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1919490.0, ans=0.125 2024-08-13 01:15:33,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1919490.0, ans=0.125 2024-08-13 01:15:40,772 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=15.0 2024-08-13 01:15:43,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1919590.0, ans=0.0 2024-08-13 01:15:43,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1919590.0, ans=0.125 2024-08-13 01:15:47,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1919590.0, ans=0.0 2024-08-13 01:16:03,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1919690.0, ans=0.2 2024-08-13 01:16:20,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1919790.0, ans=0.0 2024-08-13 01:16:32,432 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 01:16:34,499 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2024-08-13 01:16:36,605 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 36 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 01:16:39,449 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 01:16:40,609 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3600, loss[loss=0.09179, beats_loss=0.01033, ecapa_loss=0.0001804, whisper_loss=0.07966, over 15358.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01101, ecapa_loss=0.000169, whisper_loss=0.09097, over 3896136.25 frames. ], batch size: 62, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:16:55,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1919990.0, ans=0.125 2024-08-13 01:16:59,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1920090.0, ans=10.0 2024-08-13 01:17:09,882 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.20 vs. limit=10.0 2024-08-13 01:17:11,552 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2024-08-13 01:17:15,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1920190.0, ans=0.1 2024-08-13 01:17:20,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1920190.0, ans=0.0 2024-08-13 01:17:27,665 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 01:17:30,092 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.424e+01 2.680e+01 3.106e+01 1.010e+02, threshold=5.360e+01, percent-clipped=5.0 2024-08-13 01:17:40,165 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 01:17:47,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1920390.0, ans=0.125 2024-08-13 01:17:53,654 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3650, loss[loss=0.1321, beats_loss=0.01034, ecapa_loss=0.0001358, whisper_loss=0.1204, over 24561.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01098, ecapa_loss=0.0001694, whisper_loss=0.09115, over 3894328.85 frames. ], batch size: 93, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:18:09,241 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 01:18:37,295 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 01:18:42,698 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 21 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-13 01:18:50,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1920890.0, ans=0.0 2024-08-13 01:18:50,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1920890.0, ans=0.125 2024-08-13 01:18:58,039 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-13 01:19:03,626 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3700, loss[loss=0.08521, beats_loss=0.01344, ecapa_loss=0.0001304, whisper_loss=0.07046, over 20442.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01094, ecapa_loss=0.0001687, whisper_loss=0.09125, over 3867228.63 frames. ], batch size: 81, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:19:15,262 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 01:19:17,951 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 01:19:23,934 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.357e-02 2024-08-13 01:19:33,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1921190.0, ans=0.07 2024-08-13 01:19:34,852 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 26 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-13 01:19:44,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1921290.0, ans=0.0 2024-08-13 01:19:44,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1921290.0, ans=0.0 2024-08-13 01:19:44,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1921290.0, ans=0.2 2024-08-13 01:19:49,835 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.425e+01 2.811e+01 3.262e+01 7.758e+01, threshold=5.621e+01, percent-clipped=2.0 2024-08-13 01:19:51,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1921290.0, ans=0.125 2024-08-13 01:19:53,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1921290.0, ans=0.125 2024-08-13 01:20:01,383 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 01:20:13,853 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3750, loss[loss=0.09229, beats_loss=0.0117, ecapa_loss=0.0002002, whisper_loss=0.07859, over 21327.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01097, ecapa_loss=0.0001681, whisper_loss=0.09122, over 3870197.86 frames. ], batch size: 91, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:21:01,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1921790.0, ans=0.125 2024-08-13 01:21:16,097 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 18 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 01:21:23,251 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3800, loss[loss=0.0842, beats_loss=0.01329, ecapa_loss=0.000195, whisper_loss=0.06895, over 21649.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01094, ecapa_loss=0.0001693, whisper_loss=0.09157, over 3897827.97 frames. ], batch size: 91, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:21:47,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1922090.0, ans=0.2 2024-08-13 01:21:50,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1922190.0, ans=0.125 2024-08-13 01:21:51,196 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 01:21:52,669 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 01:21:52,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1922190.0, ans=0.0 2024-08-13 01:22:07,222 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2024-08-13 01:22:08,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.489e+01 2.785e+01 3.114e+01 6.895e+01, threshold=5.569e+01, percent-clipped=1.0 2024-08-13 01:22:13,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1922290.0, ans=0.125 2024-08-13 01:22:27,197 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 01:22:30,004 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 01:22:32,680 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3850, loss[loss=0.1172, beats_loss=0.009544, ecapa_loss=0.0001752, whisper_loss=0.1059, over 20207.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01089, ecapa_loss=0.0001695, whisper_loss=0.0919, over 3912958.29 frames. ], batch size: 81, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:22:42,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1922490.0, ans=0.125 2024-08-13 01:22:47,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1922590.0, ans=0.05 2024-08-13 01:23:02,580 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 01:23:12,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1922690.0, ans=0.1 2024-08-13 01:23:15,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1922790.0, ans=0.125 2024-08-13 01:23:20,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1922790.0, ans=0.0 2024-08-13 01:23:27,706 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 01:23:42,777 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3900, loss[loss=0.1053, beats_loss=0.01026, ecapa_loss=0.0001732, whisper_loss=0.09329, over 20240.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0108, ecapa_loss=0.0001702, whisper_loss=0.09249, over 3875328.71 frames. ], batch size: 77, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:23:51,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1922990.0, ans=0.0 2024-08-13 01:24:02,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1923090.0, ans=0.125 2024-08-13 01:24:04,186 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.282e-01 2024-08-13 01:24:05,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1923090.0, ans=0.0 2024-08-13 01:24:23,155 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 01:24:28,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.595e+01 2.867e+01 3.243e+01 6.009e+01, threshold=5.735e+01, percent-clipped=1.0 2024-08-13 01:24:32,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1923290.0, ans=0.125 2024-08-13 01:24:34,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1923290.0, ans=0.125 2024-08-13 01:24:43,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1923390.0, ans=0.95 2024-08-13 01:24:46,850 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-13 01:24:51,264 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 01:24:52,320 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 3950, loss[loss=0.1214, beats_loss=0.01048, ecapa_loss=0.0001374, whisper_loss=0.1096, over 23837.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01071, ecapa_loss=0.0001702, whisper_loss=0.09296, over 3889930.48 frames. ], batch size: 91, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:24:55,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1923490.0, ans=0.5 2024-08-13 01:25:21,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1923690.0, ans=0.2 2024-08-13 01:25:30,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1923690.0, ans=0.125 2024-08-13 01:25:47,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1923890.0, ans=0.1 2024-08-13 01:25:54,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1923890.0, ans=0.125 2024-08-13 01:25:55,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1923890.0, ans=0.125 2024-08-13 01:26:02,110 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4000, loss[loss=0.08902, beats_loss=0.0149, ecapa_loss=0.00012, whisper_loss=0.07292, over 21897.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01075, ecapa_loss=0.0001701, whisper_loss=0.09252, over 3882280.75 frames. ], batch size: 88, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:26:09,897 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-08-13 01:26:16,593 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 34 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 01:26:17,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1924090.0, ans=10.0 2024-08-13 01:26:19,117 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 01:26:20,468 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 01:26:40,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1924190.0, ans=0.0 2024-08-13 01:26:47,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1924290.0, ans=0.0 2024-08-13 01:26:47,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1924290.0, ans=0.2 2024-08-13 01:26:47,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.537e+01 2.883e+01 3.271e+01 5.034e+01, threshold=5.767e+01, percent-clipped=0.0 2024-08-13 01:27:12,039 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4050, loss[loss=0.08684, beats_loss=0.01084, ecapa_loss=0.0001682, whisper_loss=0.07432, over 20832.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01073, ecapa_loss=0.0001704, whisper_loss=0.09326, over 3893600.18 frames. ], batch size: 87, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:27:18,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1924490.0, ans=0.125 2024-08-13 01:27:24,855 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-13 01:27:26,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1924590.0, ans=0.125 2024-08-13 01:27:36,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1924590.0, ans=0.1 2024-08-13 01:27:39,219 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.018e+05 2024-08-13 01:27:44,507 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 01:27:55,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1924790.0, ans=0.125 2024-08-13 01:28:08,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1924890.0, ans=0.1 2024-08-13 01:28:12,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1924890.0, ans=0.0 2024-08-13 01:28:19,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1924890.0, ans=0.0 2024-08-13 01:28:21,343 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4100, loss[loss=0.1004, beats_loss=0.009783, ecapa_loss=0.0001774, whisper_loss=0.08884, over 19865.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01077, ecapa_loss=0.000171, whisper_loss=0.09331, over 3874410.00 frames. ], batch size: 81, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:29:08,172 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.339e+01 2.647e+01 3.027e+01 3.702e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-13 01:29:10,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1925290.0, ans=0.1 2024-08-13 01:29:19,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1925390.0, ans=0.0 2024-08-13 01:29:21,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1925390.0, ans=0.0 2024-08-13 01:29:32,679 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4150, loss[loss=0.1158, beats_loss=0.0102, ecapa_loss=0.0002164, whisper_loss=0.1034, over 15360.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01084, ecapa_loss=0.0001709, whisper_loss=0.09284, over 3869557.72 frames. ], batch size: 62, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:29:37,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1925490.0, ans=0.125 2024-08-13 01:29:51,195 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 01:29:54,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1925590.0, ans=0.125 2024-08-13 01:29:57,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1925590.0, ans=0.2 2024-08-13 01:30:01,351 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-13 01:30:10,959 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 39 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 01:30:16,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1925790.0, ans=0.125 2024-08-13 01:30:30,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1925890.0, ans=0.1 2024-08-13 01:30:31,672 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 01:30:32,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1925890.0, ans=0.125 2024-08-13 01:30:40,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1925890.0, ans=0.2 2024-08-13 01:30:43,177 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4200, loss[loss=0.1012, beats_loss=0.01106, ecapa_loss=0.0001808, whisper_loss=0.08829, over 17370.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01087, ecapa_loss=0.0001706, whisper_loss=0.09223, over 3866327.63 frames. ], batch size: 73, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:30:43,452 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 01:30:53,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1925990.0, ans=0.125 2024-08-13 01:31:07,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1926090.0, ans=0.1 2024-08-13 01:31:28,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.387e+01 2.732e+01 2.995e+01 7.981e+01, threshold=5.463e+01, percent-clipped=1.0 2024-08-13 01:31:30,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1926290.0, ans=0.125 2024-08-13 01:31:37,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1926390.0, ans=0.07 2024-08-13 01:31:43,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1926390.0, ans=0.05 2024-08-13 01:31:52,430 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4250, loss[loss=0.1233, beats_loss=0.006329, ecapa_loss=0.0001892, whisper_loss=0.1151, over 15277.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01084, ecapa_loss=0.0001711, whisper_loss=0.09204, over 3857036.65 frames. ], batch size: 58, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:31:58,276 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 01:32:46,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1926790.0, ans=0.125 2024-08-13 01:32:47,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1926890.0, ans=0.0 2024-08-13 01:32:51,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1926890.0, ans=0.0 2024-08-13 01:33:02,224 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4300, loss[loss=0.09986, beats_loss=0.01083, ecapa_loss=0.0001587, whisper_loss=0.08744, over 20892.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01082, ecapa_loss=0.0001708, whisper_loss=0.09202, over 3881898.86 frames. ], batch size: 82, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:33:10,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1926990.0, ans=0.0 2024-08-13 01:33:10,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1926990.0, ans=0.125 2024-08-13 01:33:38,501 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 01:33:48,271 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.403e+01 2.611e+01 3.081e+01 4.718e+01, threshold=5.222e+01, percent-clipped=0.0 2024-08-13 01:34:04,104 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-13 01:34:09,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1927390.0, ans=0.0 2024-08-13 01:34:11,453 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4350, loss[loss=0.08409, beats_loss=0.01321, ecapa_loss=0.0001464, whisper_loss=0.06942, over 13578.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01089, ecapa_loss=0.0001706, whisper_loss=0.09116, over 3886529.23 frames. ], batch size: 56, lr: 4.52e-03, grad_scale: 1.152921504606847e+18 2024-08-13 01:34:19,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1927490.0, ans=0.125 2024-08-13 01:34:20,974 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2024-08-13 01:34:23,148 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-13 01:34:35,639 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 01:34:50,881 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 01:34:52,109 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-13 01:34:53,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1927790.0, ans=0.125 2024-08-13 01:35:00,387 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-13 01:35:08,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.21 vs. limit=22.5 2024-08-13 01:35:21,040 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4400, loss[loss=0.1136, beats_loss=0.01247, ecapa_loss=0.0001018, whisper_loss=0.1001, over 17642.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01089, ecapa_loss=0.0001699, whisper_loss=0.09098, over 3866483.51 frames. ], batch size: 66, lr: 4.52e-03, grad_scale: 1.152921504606847e+18 2024-08-13 01:35:21,279 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 31 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 01:35:37,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1928090.0, ans=0.125 2024-08-13 01:35:48,724 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 01:35:51,662 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 26 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 01:35:52,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1928190.0, ans=0.0 2024-08-13 01:35:58,427 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.51 vs. limit=15.0 2024-08-13 01:36:04,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1928290.0, ans=0.125 2024-08-13 01:36:05,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.27 vs. limit=22.5 2024-08-13 01:36:06,775 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.420e+01 2.637e+01 3.058e+01 4.603e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-13 01:36:12,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1928290.0, ans=0.0 2024-08-13 01:36:13,873 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 15 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 01:36:14,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1928290.0, ans=0.0 2024-08-13 01:36:21,224 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 01:36:29,567 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 01:36:30,549 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4450, loss[loss=0.1068, beats_loss=0.01186, ecapa_loss=0.0001602, whisper_loss=0.09338, over 22394.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01095, ecapa_loss=0.0001684, whisper_loss=0.09052, over 3867806.39 frames. ], batch size: 89, lr: 4.52e-03, grad_scale: 1.152921504606847e+18 2024-08-13 01:36:34,533 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.65 vs. limit=22.5 2024-08-13 01:36:57,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1928690.0, ans=0.0 2024-08-13 01:37:01,887 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.448e+00 2024-08-13 01:37:09,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.39 vs. limit=15.0 2024-08-13 01:37:14,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1928790.0, ans=0.125 2024-08-13 01:37:15,693 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 01:37:19,889 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:37:24,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1928790.0, ans=0.125 2024-08-13 01:37:32,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1928890.0, ans=0.1 2024-08-13 01:37:37,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1928890.0, ans=0.0 2024-08-13 01:37:39,855 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4500, loss[loss=0.09721, beats_loss=0.01233, ecapa_loss=0.0001496, whisper_loss=0.08338, over 17154.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01097, ecapa_loss=0.000169, whisper_loss=0.09041, over 3880627.23 frames. ], batch size: 66, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:37:44,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1928990.0, ans=0.0 2024-08-13 01:37:46,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1928990.0, ans=0.1 2024-08-13 01:37:49,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1928990.0, ans=0.0 2024-08-13 01:38:05,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1929090.0, ans=0.2 2024-08-13 01:38:27,274 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.366e+01 2.717e+01 3.132e+01 4.916e+01, threshold=5.434e+01, percent-clipped=0.0 2024-08-13 01:38:29,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=1929290.0, ans=0.1 2024-08-13 01:38:38,952 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 01:38:39,583 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.33 vs. limit=6.0 2024-08-13 01:38:49,587 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4550, loss[loss=0.08339, beats_loss=0.01142, ecapa_loss=0.0001879, whisper_loss=0.07008, over 18904.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01089, ecapa_loss=0.0001698, whisper_loss=0.09172, over 3890562.93 frames. ], batch size: 80, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:38:55,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1929490.0, ans=0.1 2024-08-13 01:38:59,179 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2024-08-13 01:39:05,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1929590.0, ans=0.2 2024-08-13 01:39:06,657 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 01:39:07,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1929590.0, ans=0.125 2024-08-13 01:39:33,602 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 01:39:39,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1929790.0, ans=0.0 2024-08-13 01:39:40,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1929790.0, ans=0.0 2024-08-13 01:39:59,468 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4600, loss[loss=0.1069, beats_loss=0.01132, ecapa_loss=0.0001501, whisper_loss=0.09404, over 22888.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.0001695, whisper_loss=0.09115, over 3874103.99 frames. ], batch size: 90, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:40:00,391 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.43 vs. limit=10.0 2024-08-13 01:40:18,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1930090.0, ans=0.0 2024-08-13 01:40:34,702 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-13 01:40:42,138 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 24 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 01:40:45,497 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=22.5 2024-08-13 01:40:46,086 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.515e+01 2.755e+01 3.045e+01 4.770e+01, threshold=5.510e+01, percent-clipped=0.0 2024-08-13 01:40:55,922 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 28 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 01:41:07,979 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4650, loss[loss=0.0974, beats_loss=0.01344, ecapa_loss=0.0001338, whisper_loss=0.08262, over 21692.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01092, ecapa_loss=0.0001686, whisper_loss=0.09122, over 3877422.22 frames. ], batch size: 84, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:41:14,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1930490.0, ans=0.0 2024-08-13 01:41:17,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1930490.0, ans=0.125 2024-08-13 01:41:22,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1930590.0, ans=0.1 2024-08-13 01:41:26,923 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.058e-02 2024-08-13 01:41:33,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1930590.0, ans=0.125 2024-08-13 01:41:47,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1930690.0, ans=0.125 2024-08-13 01:42:01,131 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-08-13 01:42:03,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1930890.0, ans=0.125 2024-08-13 01:42:05,043 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-13 01:42:07,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1930890.0, ans=0.0 2024-08-13 01:42:12,716 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 01:42:16,707 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4700, loss[loss=0.1161, beats_loss=0.01059, ecapa_loss=0.0001594, whisper_loss=0.1039, over 15912.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0109, ecapa_loss=0.0001685, whisper_loss=0.09193, over 3878687.85 frames. ], batch size: 61, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:42:51,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1931190.0, ans=0.0 2024-08-13 01:43:02,827 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=12.0 2024-08-13 01:43:03,144 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.542e+01 2.823e+01 3.098e+01 3.628e+02, threshold=5.646e+01, percent-clipped=2.0 2024-08-13 01:43:26,263 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4750, loss[loss=0.123, beats_loss=0.00867, ecapa_loss=0.0001587, whisper_loss=0.1127, over 18330.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01091, ecapa_loss=0.0001695, whisper_loss=0.09209, over 3875753.82 frames. ], batch size: 69, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:44:09,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1931790.0, ans=0.1 2024-08-13 01:44:17,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1931790.0, ans=0.1 2024-08-13 01:44:28,760 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 01:44:36,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1931890.0, ans=0.125 2024-08-13 01:44:42,358 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4800, loss[loss=0.1144, beats_loss=0.009954, ecapa_loss=0.0002115, whisper_loss=0.1023, over 21729.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01093, ecapa_loss=0.0001709, whisper_loss=0.09164, over 3877286.62 frames. ], batch size: 90, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:44:54,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1931990.0, ans=0.0 2024-08-13 01:44:57,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1931990.0, ans=0.2 2024-08-13 01:45:00,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1932090.0, ans=0.125 2024-08-13 01:45:05,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1932090.0, ans=0.125 2024-08-13 01:45:08,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1932090.0, ans=0.0 2024-08-13 01:45:24,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1932190.0, ans=0.2 2024-08-13 01:45:26,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1932190.0, ans=0.0 2024-08-13 01:45:49,580 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.506e+01 2.786e+01 3.078e+01 4.876e+01, threshold=5.572e+01, percent-clipped=0.0 2024-08-13 01:46:05,852 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.45 vs. limit=6.0 2024-08-13 01:46:21,802 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4850, loss[loss=0.1175, beats_loss=0.007654, ecapa_loss=0.0002175, whisper_loss=0.1077, over 13660.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01094, ecapa_loss=0.0001711, whisper_loss=0.09191, over 3899859.11 frames. ], batch size: 54, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:46:30,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1932490.0, ans=0.2 2024-08-13 01:47:09,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1932690.0, ans=0.0 2024-08-13 01:47:19,944 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.52 vs. limit=15.0 2024-08-13 01:47:56,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1932890.0, ans=0.125 2024-08-13 01:48:06,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1932890.0, ans=0.05 2024-08-13 01:48:11,568 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4900, loss[loss=0.1198, beats_loss=0.008665, ecapa_loss=0.0001826, whisper_loss=0.1093, over 17495.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01085, ecapa_loss=0.0001707, whisper_loss=0.09294, over 3919921.43 frames. ], batch size: 68, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:48:21,860 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.69 vs. limit=15.0 2024-08-13 01:48:25,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1932990.0, ans=0.0 2024-08-13 01:48:29,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1932990.0, ans=0.2 2024-08-13 01:48:47,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1933090.0, ans=0.125 2024-08-13 01:49:12,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1933190.0, ans=0.0 2024-08-13 01:49:20,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1933290.0, ans=0.125 2024-08-13 01:49:22,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1933290.0, ans=0.09899494936611666 2024-08-13 01:49:30,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1933290.0, ans=0.125 2024-08-13 01:49:32,266 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.462e+01 2.765e+01 3.056e+01 4.985e+01, threshold=5.531e+01, percent-clipped=0.0 2024-08-13 01:50:03,714 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 4950, loss[loss=0.09975, beats_loss=0.01118, ecapa_loss=0.0001536, whisper_loss=0.08703, over 20531.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0108, ecapa_loss=0.0001713, whisper_loss=0.0925, over 3886436.42 frames. ], batch size: 81, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:50:49,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1933790.0, ans=0.0 2024-08-13 01:51:13,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1933890.0, ans=0.125 2024-08-13 01:51:15,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1933890.0, ans=0.04949747468305833 2024-08-13 01:51:16,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1933890.0, ans=0.125 2024-08-13 01:51:20,978 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5000, loss[loss=0.1006, beats_loss=0.01347, ecapa_loss=0.0001425, whisper_loss=0.08571, over 19291.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01082, ecapa_loss=0.0001692, whisper_loss=0.09251, over 3864898.49 frames. ], batch size: 72, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:51:38,331 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-13 01:51:39,086 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 19 from LS+wenet, 25 from Vox, 49 fro AS 2024-08-13 01:51:42,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1934090.0, ans=0.09899494936611666 2024-08-13 01:51:58,638 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 30 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 01:52:00,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1934190.0, ans=0.2 2024-08-13 01:52:07,936 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 01:52:12,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1934290.0, ans=0.1 2024-08-13 01:52:13,258 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.373e+01 2.737e+01 3.184e+01 6.268e+01, threshold=5.474e+01, percent-clipped=1.0 2024-08-13 01:52:17,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1934290.0, ans=0.0 2024-08-13 01:52:22,466 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 01:52:22,937 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2024-08-13 01:52:27,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1934390.0, ans=0.0 2024-08-13 01:52:35,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1934390.0, ans=0.125 2024-08-13 01:52:36,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1934390.0, ans=0.125 2024-08-13 01:52:39,237 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5050, loss[loss=0.1028, beats_loss=0.0115, ecapa_loss=0.0001729, whisper_loss=0.08954, over 22229.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0109, ecapa_loss=0.0001692, whisper_loss=0.09208, over 3840775.47 frames. ], batch size: 92, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:52:55,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1934590.0, ans=0.2 2024-08-13 01:53:15,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=15.0 2024-08-13 01:53:22,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1934690.0, ans=0.04949747468305833 2024-08-13 01:53:24,789 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 01:53:42,927 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 01:53:46,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1934890.0, ans=0.0 2024-08-13 01:54:00,068 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5100, loss[loss=0.1216, beats_loss=0.01044, ecapa_loss=0.0001864, whisper_loss=0.1093, over 22688.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01091, ecapa_loss=0.0001689, whisper_loss=0.0927, over 3838875.77 frames. ], batch size: 92, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:54:02,951 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-13 01:54:26,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1935090.0, ans=0.0 2024-08-13 01:54:28,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1935090.0, ans=0.125 2024-08-13 01:54:31,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1935190.0, ans=0.035 2024-08-13 01:54:40,352 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-13 01:54:46,732 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 01:54:47,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1935190.0, ans=0.125 2024-08-13 01:54:56,872 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.475e+01 2.679e+01 3.018e+01 4.914e+01, threshold=5.357e+01, percent-clipped=0.0 2024-08-13 01:55:02,032 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 14 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 01:55:08,497 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 32 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-13 01:55:15,659 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2024-08-13 01:55:16,364 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 01:55:19,815 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-13 01:55:22,010 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5150, loss[loss=0.1251, beats_loss=0.009278, ecapa_loss=0.000187, whisper_loss=0.114, over 22490.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01084, ecapa_loss=0.0001688, whisper_loss=0.09265, over 3849308.89 frames. ], batch size: 89, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:55:32,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1935490.0, ans=0.0 2024-08-13 01:55:34,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1935490.0, ans=0.0 2024-08-13 01:55:50,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1935590.0, ans=0.125 2024-08-13 01:55:50,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1935590.0, ans=0.125 2024-08-13 01:56:10,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1935690.0, ans=0.125 2024-08-13 01:56:23,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1935790.0, ans=0.1 2024-08-13 01:56:31,658 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 01:56:35,579 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 01:56:35,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1935890.0, ans=0.0 2024-08-13 01:56:44,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1935890.0, ans=0.125 2024-08-13 01:56:47,545 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5200, loss[loss=0.1052, beats_loss=0.008478, ecapa_loss=0.0002013, whisper_loss=0.09471, over 19036.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0108, ecapa_loss=0.0001704, whisper_loss=0.09279, over 3872384.59 frames. ], batch size: 76, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:57:01,370 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 01:57:11,688 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.58 vs. limit=22.5 2024-08-13 01:57:15,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1936090.0, ans=0.0 2024-08-13 01:57:34,787 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2024-08-13 01:57:42,202 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.433e+01 2.676e+01 3.023e+01 1.012e+02, threshold=5.352e+01, percent-clipped=2.0 2024-08-13 01:58:03,302 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-08-13 01:58:05,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1936390.0, ans=0.125 2024-08-13 01:58:08,516 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5250, loss[loss=0.1045, beats_loss=0.01007, ecapa_loss=0.0001778, whisper_loss=0.09262, over 19141.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01079, ecapa_loss=0.0001694, whisper_loss=0.09272, over 3855999.26 frames. ], batch size: 77, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:58:09,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1936490.0, ans=0.125 2024-08-13 01:58:11,186 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-08-13 01:58:24,528 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.96 vs. limit=22.5 2024-08-13 01:58:35,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1936590.0, ans=0.125 2024-08-13 01:58:38,033 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 26 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-13 01:58:42,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1936690.0, ans=0.125 2024-08-13 01:58:44,958 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2024-08-13 01:58:46,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1936690.0, ans=0.125 2024-08-13 01:58:54,909 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-13 01:58:55,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1936690.0, ans=0.125 2024-08-13 01:58:56,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1936790.0, ans=0.125 2024-08-13 01:58:58,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1936790.0, ans=0.2 2024-08-13 01:59:27,148 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 01:59:30,438 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5300, loss[loss=0.1101, beats_loss=0.01326, ecapa_loss=0.0001088, whisper_loss=0.09577, over 15178.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01082, ecapa_loss=0.0001688, whisper_loss=0.09251, over 3880962.08 frames. ], batch size: 58, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:59:38,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1936990.0, ans=0.125 2024-08-13 02:00:13,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1937190.0, ans=0.125 2024-08-13 02:00:15,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1937190.0, ans=0.0 2024-08-13 02:00:25,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.483e+01 2.816e+01 3.213e+01 1.142e+02, threshold=5.632e+01, percent-clipped=3.0 2024-08-13 02:00:28,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1937290.0, ans=22.5 2024-08-13 02:00:36,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1937390.0, ans=0.1 2024-08-13 02:00:40,602 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 02:00:51,039 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5350, loss[loss=0.09274, beats_loss=0.01325, ecapa_loss=0.0001749, whisper_loss=0.07774, over 21822.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01074, ecapa_loss=0.0001707, whisper_loss=0.09206, over 3853721.26 frames. ], batch size: 90, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:01:05,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1937490.0, ans=0.07 2024-08-13 02:01:06,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1937590.0, ans=0.125 2024-08-13 02:01:12,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1937590.0, ans=0.125 2024-08-13 02:01:14,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1937590.0, ans=0.125 2024-08-13 02:01:39,524 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-13 02:01:40,806 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-13 02:01:49,325 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-13 02:01:53,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2024-08-13 02:01:54,087 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 02:02:13,429 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5400, loss[loss=0.1133, beats_loss=0.01003, ecapa_loss=0.0001447, whisper_loss=0.1018, over 16113.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01072, ecapa_loss=0.0001701, whisper_loss=0.09239, over 3839779.73 frames. ], batch size: 61, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:02:21,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1937990.0, ans=0.1 2024-08-13 02:02:24,715 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 02:02:35,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1938090.0, ans=0.125 2024-08-13 02:02:39,802 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 02:03:02,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1938290.0, ans=0.0 2024-08-13 02:03:04,830 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.20 vs. limit=15.0 2024-08-13 02:03:09,723 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.492e+01 2.751e+01 3.252e+01 5.304e+01, threshold=5.502e+01, percent-clipped=0.0 2024-08-13 02:03:11,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1938290.0, ans=0.0 2024-08-13 02:03:12,586 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 02:03:34,456 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2024-08-13 02:03:37,125 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5450, loss[loss=0.09982, beats_loss=0.009012, ecapa_loss=0.000187, whisper_loss=0.08894, over 17163.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01074, ecapa_loss=0.0001696, whisper_loss=0.09234, over 3870499.74 frames. ], batch size: 69, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:03:44,612 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 02:03:52,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1938590.0, ans=0.0 2024-08-13 02:04:00,220 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 02:04:05,801 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 02:04:34,042 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-13 02:04:45,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1938890.0, ans=0.0 2024-08-13 02:04:59,451 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5500, loss[loss=0.09771, beats_loss=0.01263, ecapa_loss=0.000169, whisper_loss=0.08339, over 22638.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01082, ecapa_loss=0.0001693, whisper_loss=0.092, over 3884018.68 frames. ], batch size: 95, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:05:16,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1939090.0, ans=0.2 2024-08-13 02:05:25,448 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.03 vs. limit=22.5 2024-08-13 02:05:52,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.471e+01 2.738e+01 3.080e+01 7.605e+01, threshold=5.476e+01, percent-clipped=2.0 2024-08-13 02:06:01,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1939390.0, ans=0.1 2024-08-13 02:06:05,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1939390.0, ans=0.125 2024-08-13 02:06:08,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1939390.0, ans=0.125 2024-08-13 02:06:11,681 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-13 02:06:18,517 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5550, loss[loss=0.1066, beats_loss=0.01069, ecapa_loss=0.0001757, whisper_loss=0.09414, over 21653.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01077, ecapa_loss=0.0001696, whisper_loss=0.09215, over 3885726.78 frames. ], batch size: 88, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:06:58,601 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 02:07:04,227 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2024-08-13 02:07:06,865 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 02:07:09,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1939790.0, ans=0.0 2024-08-13 02:07:17,074 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 12 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 02:07:38,619 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5600, loss[loss=0.1177, beats_loss=0.01168, ecapa_loss=0.0001368, whisper_loss=0.1046, over 23487.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01069, ecapa_loss=0.0001702, whisper_loss=0.09286, over 3863309.95 frames. ], batch size: 90, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:07:44,970 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 02:07:45,505 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=12.0 2024-08-13 02:07:46,722 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 02:07:52,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1939990.0, ans=0.125 2024-08-13 02:08:06,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1940090.0, ans=0.1 2024-08-13 02:08:12,511 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=22.5 2024-08-13 02:08:15,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1940190.0, ans=0.5 2024-08-13 02:08:35,847 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.482e+01 2.705e+01 3.003e+01 6.205e+01, threshold=5.410e+01, percent-clipped=1.0 2024-08-13 02:08:39,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1940290.0, ans=0.125 2024-08-13 02:09:01,716 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5650, loss[loss=0.09639, beats_loss=0.01176, ecapa_loss=0.0002118, whisper_loss=0.08251, over 20474.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01087, ecapa_loss=0.0001703, whisper_loss=0.09229, over 3910371.10 frames. ], batch size: 91, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:09:19,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1940590.0, ans=0.0 2024-08-13 02:09:36,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1940690.0, ans=0.125 2024-08-13 02:09:47,630 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-13 02:10:03,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.38 vs. limit=15.0 2024-08-13 02:10:17,474 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 02:10:22,411 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5700, loss[loss=0.1193, beats_loss=0.007859, ecapa_loss=0.0002026, whisper_loss=0.1094, over 19022.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01093, ecapa_loss=0.0001711, whisper_loss=0.0913, over 3923942.86 frames. ], batch size: 75, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:10:54,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1941190.0, ans=0.125 2024-08-13 02:11:02,436 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-13 02:11:15,106 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 02:11:16,724 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.518e+01 2.759e+01 3.173e+01 1.965e+02, threshold=5.519e+01, percent-clipped=1.0 2024-08-13 02:11:36,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2024-08-13 02:11:41,420 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5750, loss[loss=0.1139, beats_loss=0.01061, ecapa_loss=0.0001591, whisper_loss=0.1017, over 14169.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01094, ecapa_loss=0.0001713, whisper_loss=0.09127, over 3919791.56 frames. ], batch size: 55, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:11:50,828 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.036e+01 2024-08-13 02:11:52,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1941490.0, ans=0.125 2024-08-13 02:12:07,829 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-13 02:12:28,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1941790.0, ans=0.0 2024-08-13 02:12:42,919 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.097e-02 2024-08-13 02:12:50,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=12.0 2024-08-13 02:13:02,524 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5800, loss[loss=0.08898, beats_loss=0.0127, ecapa_loss=0.0001943, whisper_loss=0.07434, over 20047.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.0001708, whisper_loss=0.09165, over 3923853.70 frames. ], batch size: 85, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:13:09,303 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 23 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-13 02:13:28,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1942090.0, ans=0.0 2024-08-13 02:13:35,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1942190.0, ans=0.1 2024-08-13 02:13:39,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1942190.0, ans=0.1 2024-08-13 02:13:48,995 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.27 vs. limit=15.0 2024-08-13 02:13:50,125 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 02:13:56,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1942290.0, ans=0.125 2024-08-13 02:13:57,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.443e+01 2.748e+01 3.161e+01 4.611e+01, threshold=5.495e+01, percent-clipped=0.0 2024-08-13 02:14:06,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1942290.0, ans=0.0 2024-08-13 02:14:09,642 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2024-08-13 02:14:17,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1942390.0, ans=0.125 2024-08-13 02:14:24,578 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5850, loss[loss=0.09051, beats_loss=0.0142, ecapa_loss=0.0001681, whisper_loss=0.07462, over 22365.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01102, ecapa_loss=0.0001715, whisper_loss=0.09091, over 3936367.61 frames. ], batch size: 93, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:14:26,041 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 02:14:29,350 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2024-08-13 02:14:41,684 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 02:14:46,893 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 02:14:52,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=22.5 2024-08-13 02:15:17,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1942790.0, ans=0.2 2024-08-13 02:15:35,306 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 02:15:43,733 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 14 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 02:15:47,595 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5900, loss[loss=0.09435, beats_loss=0.01154, ecapa_loss=0.0002099, whisper_loss=0.08071, over 20782.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01111, ecapa_loss=0.0001704, whisper_loss=0.09031, over 3916273.71 frames. ], batch size: 92, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:15:55,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1942990.0, ans=0.125 2024-08-13 02:16:25,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1943190.0, ans=0.0 2024-08-13 02:16:40,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.528e+01 2.790e+01 3.084e+01 1.766e+02, threshold=5.581e+01, percent-clipped=1.0 2024-08-13 02:17:07,177 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 5950, loss[loss=0.1019, beats_loss=0.01291, ecapa_loss=0.0001747, whisper_loss=0.08727, over 21592.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01103, ecapa_loss=0.0001702, whisper_loss=0.09057, over 3901813.45 frames. ], batch size: 89, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:17:18,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-13 02:17:24,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=1943590.0, ans=22.5 2024-08-13 02:17:46,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1943690.0, ans=0.04949747468305833 2024-08-13 02:17:49,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1943690.0, ans=0.125 2024-08-13 02:18:02,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1943790.0, ans=0.125 2024-08-13 02:18:11,386 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 02:18:12,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1943890.0, ans=0.2 2024-08-13 02:18:15,287 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.02 vs. limit=10.0 2024-08-13 02:18:28,332 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6000, loss[loss=0.09653, beats_loss=0.01344, ecapa_loss=0.0001492, whisper_loss=0.08159, over 22318.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01101, ecapa_loss=0.0001704, whisper_loss=0.09076, over 3873019.73 frames. ], batch size: 88, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:18:28,332 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 02:19:07,064 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on ASR_libri: loss=0.2552, beats_loss=0, ecapa_loss=0.0005835, whisper_loss=0.2494, over 922467.00 frames. 2024-08-13 02:19:25,538 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on SV_voxceleb1: loss=0.004586, beats_loss=0, ecapa_loss=0.0004586, whisper_loss=0, over 939242.00 frames. 2024-08-13 02:20:36,805 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.6365, 1.6831, 1.9166, 1.1558], device='cuda:1') 2024-08-13 02:21:14,510 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on AT_audioset: loss=0.02397, beats_loss=0.02397, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 02:21:14,519 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-13 02:21:16,161 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 02:21:17,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1943990.0, ans=0.0 2024-08-13 02:21:24,928 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=12.0 2024-08-13 02:21:41,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1944090.0, ans=0.125 2024-08-13 02:21:44,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1944090.0, ans=0.0 2024-08-13 02:21:50,298 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=12.0 2024-08-13 02:22:10,217 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.472e+01 2.800e+01 3.130e+01 4.518e+01, threshold=5.599e+01, percent-clipped=0.0 2024-08-13 02:22:13,874 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 02:22:14,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1944290.0, ans=0.125 2024-08-13 02:22:19,189 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.95 vs. limit=10.0 2024-08-13 02:22:29,941 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 02:22:35,911 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6050, loss[loss=0.1016, beats_loss=0.01176, ecapa_loss=0.0001273, whisper_loss=0.08857, over 23678.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01092, ecapa_loss=0.0001694, whisper_loss=0.09121, over 3857493.64 frames. ], batch size: 91, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:22:39,978 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 02:22:40,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1944490.0, ans=0.125 2024-08-13 02:22:49,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1944490.0, ans=0.0 2024-08-13 02:22:49,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1944490.0, ans=0.0 2024-08-13 02:22:58,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1944590.0, ans=0.0 2024-08-13 02:23:19,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1944690.0, ans=0.125 2024-08-13 02:23:25,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1944790.0, ans=0.0 2024-08-13 02:23:33,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1944790.0, ans=0.0 2024-08-13 02:23:58,257 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6100, loss[loss=0.09291, beats_loss=0.0126, ecapa_loss=0.0001754, whisper_loss=0.07856, over 21900.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01094, ecapa_loss=0.0001704, whisper_loss=0.09119, over 3844587.12 frames. ], batch size: 93, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:24:15,249 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 02:24:30,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1945190.0, ans=0.125 2024-08-13 02:24:38,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1945190.0, ans=0.125 2024-08-13 02:24:51,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1945290.0, ans=0.125 2024-08-13 02:24:53,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.564e+01 2.945e+01 3.314e+01 6.954e+01, threshold=5.890e+01, percent-clipped=1.0 2024-08-13 02:25:13,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-08-13 02:25:19,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1945490.0, ans=0.125 2024-08-13 02:25:21,097 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6150, loss[loss=0.1096, beats_loss=0.01116, ecapa_loss=0.0001511, whisper_loss=0.09693, over 22092.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01093, ecapa_loss=0.0001693, whisper_loss=0.09133, over 3855766.07 frames. ], batch size: 85, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:25:23,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1945490.0, ans=0.09899494936611666 2024-08-13 02:25:29,375 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 36 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 02:25:30,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1945490.0, ans=0.0 2024-08-13 02:25:32,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1945490.0, ans=0.125 2024-08-13 02:25:34,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-13 02:25:47,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1945590.0, ans=0.125 2024-08-13 02:26:18,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1945790.0, ans=0.125 2024-08-13 02:26:28,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1945890.0, ans=0.125 2024-08-13 02:26:28,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1945890.0, ans=0.2 2024-08-13 02:26:29,369 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 38 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 02:26:37,658 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 02:26:42,039 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6200, loss[loss=0.09509, beats_loss=0.01148, ecapa_loss=0.000127, whisper_loss=0.08233, over 22933.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01099, ecapa_loss=0.0001688, whisper_loss=0.09102, over 3861659.80 frames. ], batch size: 88, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:27:00,227 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 02:27:02,009 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 02:27:19,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1946190.0, ans=0.125 2024-08-13 02:27:39,575 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.499e+01 2.801e+01 3.134e+01 4.474e+01, threshold=5.602e+01, percent-clipped=0.0 2024-08-13 02:27:47,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1946390.0, ans=0.125 2024-08-13 02:27:56,713 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.23 vs. limit=22.5 2024-08-13 02:28:00,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1946390.0, ans=0.125 2024-08-13 02:28:05,151 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6250, loss[loss=0.09483, beats_loss=0.0114, ecapa_loss=0.000176, whisper_loss=0.08167, over 17073.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01091, ecapa_loss=0.0001698, whisper_loss=0.09126, over 3856697.70 frames. ], batch size: 70, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:28:08,498 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 34 from Vox, 27 fro AS 2024-08-13 02:28:14,946 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 23 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-13 02:28:34,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1946590.0, ans=0.0 2024-08-13 02:28:40,227 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 02:28:45,467 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2024-08-13 02:29:07,191 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 02:29:24,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1946890.0, ans=0.1 2024-08-13 02:29:26,988 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6300, loss[loss=0.09603, beats_loss=0.008572, ecapa_loss=0.0001949, whisper_loss=0.08551, over 16806.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0109, ecapa_loss=0.0001695, whisper_loss=0.09132, over 3831891.32 frames. ], batch size: 67, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:29:45,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1947090.0, ans=0.2 2024-08-13 02:29:47,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1947090.0, ans=0.05 2024-08-13 02:29:59,498 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 02:30:09,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1947190.0, ans=0.1 2024-08-13 02:30:20,804 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.428e+01 2.719e+01 3.075e+01 5.745e+01, threshold=5.438e+01, percent-clipped=1.0 2024-08-13 02:30:27,661 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 02:30:29,032 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 02:30:35,977 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 02:30:42,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1947390.0, ans=0.0 2024-08-13 02:30:45,823 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6350, loss[loss=0.06685, beats_loss=0.01385, ecapa_loss=0.0001586, whisper_loss=0.05141, over 12903.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01094, ecapa_loss=0.0001701, whisper_loss=0.09112, over 3804280.09 frames. ], batch size: 54, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:31:38,146 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 02:31:46,703 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 02:31:58,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1947890.0, ans=0.125 2024-08-13 02:32:06,061 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 02:32:07,142 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6400, loss[loss=0.0865, beats_loss=0.01237, ecapa_loss=0.0001697, whisper_loss=0.07244, over 21396.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01092, ecapa_loss=0.0001695, whisper_loss=0.09121, over 3823858.02 frames. ], batch size: 91, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:32:49,078 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 02:32:59,380 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2024-08-13 02:33:03,145 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-13 02:33:04,973 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.410e+01 2.725e+01 3.146e+01 5.039e+01, threshold=5.450e+01, percent-clipped=0.0 2024-08-13 02:33:05,330 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 02:33:11,017 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.00 vs. limit=22.5 2024-08-13 02:33:31,107 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6450, loss[loss=0.09349, beats_loss=0.01185, ecapa_loss=0.0001858, whisper_loss=0.07978, over 20330.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.0001696, whisper_loss=0.09204, over 3826721.16 frames. ], batch size: 86, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:33:31,998 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-13 02:33:43,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1948490.0, ans=0.025 2024-08-13 02:33:43,670 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.02 vs. limit=22.5 2024-08-13 02:34:11,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1948690.0, ans=0.125 2024-08-13 02:34:20,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.58 vs. limit=15.0 2024-08-13 02:34:23,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1948790.0, ans=0.0 2024-08-13 02:34:32,682 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-13 02:34:40,073 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 02:34:55,243 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6500, loss[loss=0.1052, beats_loss=0.01125, ecapa_loss=0.0001813, whisper_loss=0.0921, over 20778.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.0001692, whisper_loss=0.09163, over 3831348.33 frames. ], batch size: 81, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:34:57,067 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-13 02:35:00,044 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-13 02:35:05,784 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-13 02:35:39,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1949190.0, ans=0.125 2024-08-13 02:35:51,028 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.473e+01 2.682e+01 2.925e+01 4.435e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-13 02:35:51,720 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2024-08-13 02:35:54,160 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 02:36:01,786 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 02:36:17,551 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6550, loss[loss=0.1136, beats_loss=0.00929, ecapa_loss=0.0001925, whisper_loss=0.1024, over 21457.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01084, ecapa_loss=0.0001704, whisper_loss=0.09229, over 3853169.10 frames. ], batch size: 88, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:36:22,839 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-08-13 02:36:30,815 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2024-08-13 02:36:58,951 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 02:37:29,993 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-13 02:37:41,333 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6600, loss[loss=0.0863, beats_loss=0.01044, ecapa_loss=0.0002234, whisper_loss=0.07363, over 18499.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01092, ecapa_loss=0.0001716, whisper_loss=0.09165, over 3883106.91 frames. ], batch size: 77, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:37:52,304 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 28 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-13 02:38:22,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1950090.0, ans=0.1 2024-08-13 02:39:00,632 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-13 02:39:02,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1950190.0, ans=0.125 2024-08-13 02:39:06,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1950290.0, ans=0.125 2024-08-13 02:39:14,100 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.510e+01 2.757e+01 3.096e+01 4.067e+01, threshold=5.514e+01, percent-clipped=0.0 2024-08-13 02:39:18,531 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.81 vs. limit=15.0 2024-08-13 02:39:33,656 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=12.0 2024-08-13 02:39:33,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-13 02:39:39,268 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6650, loss[loss=0.1045, beats_loss=0.01059, ecapa_loss=0.000154, whisper_loss=0.09236, over 22581.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01097, ecapa_loss=0.0001704, whisper_loss=0.09158, over 3905939.93 frames. ], batch size: 89, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:39:51,828 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 02:40:00,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2024-08-13 02:40:00,242 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=22.5 2024-08-13 02:40:31,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1950690.0, ans=0.125 2024-08-13 02:40:58,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1950890.0, ans=0.1 2024-08-13 02:40:58,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1950890.0, ans=0.125 2024-08-13 02:41:03,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1950890.0, ans=0.0 2024-08-13 02:41:11,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1950890.0, ans=0.125 2024-08-13 02:41:16,360 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6700, loss[loss=0.1215, beats_loss=0.008616, ecapa_loss=0.0001857, whisper_loss=0.111, over 16583.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01095, ecapa_loss=0.0001697, whisper_loss=0.0921, over 3932310.26 frames. ], batch size: 65, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:41:24,689 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 02:41:25,061 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.52 vs. limit=10.0 2024-08-13 02:41:26,516 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 02:41:30,849 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 02:41:40,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1951090.0, ans=0.0 2024-08-13 02:42:05,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1951190.0, ans=0.125 2024-08-13 02:42:08,055 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 02:42:19,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1951290.0, ans=0.0 2024-08-13 02:42:23,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+01 2.594e+01 2.894e+01 3.478e+01 5.381e+01, threshold=5.788e+01, percent-clipped=0.0 2024-08-13 02:42:28,348 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 02:42:40,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1951390.0, ans=0.125 2024-08-13 02:42:50,839 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 02:43:00,125 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6750, loss[loss=0.09107, beats_loss=0.01305, ecapa_loss=0.0001962, whisper_loss=0.07605, over 18569.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01096, ecapa_loss=0.0001698, whisper_loss=0.09117, over 3879478.63 frames. ], batch size: 81, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:43:07,706 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-13 02:43:07,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1951490.0, ans=0.1 2024-08-13 02:43:09,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1951490.0, ans=0.125 2024-08-13 02:43:09,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1951490.0, ans=0.5 2024-08-13 02:43:16,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1951490.0, ans=0.0 2024-08-13 02:43:19,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1951490.0, ans=0.125 2024-08-13 02:43:21,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1951590.0, ans=0.0 2024-08-13 02:43:44,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1951690.0, ans=0.0 2024-08-13 02:43:52,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-08-13 02:43:56,923 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-13 02:43:58,836 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 02:44:26,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1951790.0, ans=0.0 2024-08-13 02:44:34,624 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 37 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 02:44:36,066 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 02:44:56,309 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 24 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-13 02:44:57,382 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6800, loss[loss=0.1232, beats_loss=0.008675, ecapa_loss=0.0001473, whisper_loss=0.113, over 16205.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0108, ecapa_loss=0.0001709, whisper_loss=0.09223, over 3895637.81 frames. ], batch size: 59, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:45:00,718 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=12.0 2024-08-13 02:45:10,359 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 02:45:17,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1951990.0, ans=0.125 2024-08-13 02:45:18,014 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.59 vs. limit=15.0 2024-08-13 02:45:37,865 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 18 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 02:45:41,157 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 02:45:56,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1952190.0, ans=15.0 2024-08-13 02:46:04,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1952190.0, ans=0.125 2024-08-13 02:46:16,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.452e+01 2.716e+01 3.076e+01 4.037e+01, threshold=5.431e+01, percent-clipped=0.0 2024-08-13 02:46:29,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1952390.0, ans=0.125 2024-08-13 02:46:31,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1952390.0, ans=0.2 2024-08-13 02:46:44,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1952390.0, ans=0.2 2024-08-13 02:46:52,766 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6850, loss[loss=0.1133, beats_loss=0.01158, ecapa_loss=0.0001695, whisper_loss=0.1, over 22807.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01075, ecapa_loss=0.0001691, whisper_loss=0.09227, over 3874324.42 frames. ], batch size: 91, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:46:53,729 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-08-13 02:47:34,499 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 02:47:43,127 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 02:47:53,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1952690.0, ans=6.0 2024-08-13 02:47:53,748 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.91 vs. limit=10.0 2024-08-13 02:48:08,399 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 02:48:12,232 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 37 from Vox, 35 fro AS 2024-08-13 02:48:13,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1952790.0, ans=0.0 2024-08-13 02:48:43,141 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6900, loss[loss=0.09436, beats_loss=0.0116, ecapa_loss=0.0001995, whisper_loss=0.08077, over 21395.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01083, ecapa_loss=0.0001697, whisper_loss=0.09225, over 3884382.34 frames. ], batch size: 90, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:48:46,206 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-13 02:49:11,231 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 02:49:26,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1953190.0, ans=0.04949747468305833 2024-08-13 02:49:41,586 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.589e+01 2.754e+01 3.182e+01 2.951e+02, threshold=5.508e+01, percent-clipped=1.0 2024-08-13 02:49:47,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1953290.0, ans=0.125 2024-08-13 02:49:49,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1953390.0, ans=0.125 2024-08-13 02:49:55,838 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 02:49:57,466 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 02:50:07,483 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 6950, loss[loss=0.1186, beats_loss=0.0112, ecapa_loss=0.0001351, whisper_loss=0.106, over 23499.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0109, ecapa_loss=0.0001677, whisper_loss=0.09206, over 3855355.24 frames. ], batch size: 90, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:50:08,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1953490.0, ans=0.125 2024-08-13 02:50:08,374 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-08-13 02:50:19,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1953490.0, ans=0.5 2024-08-13 02:50:25,141 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.74 vs. limit=15.0 2024-08-13 02:50:42,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1953590.0, ans=0.0 2024-08-13 02:50:43,898 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 02:50:51,367 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 22 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 02:51:00,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1953690.0, ans=0.025 2024-08-13 02:51:14,898 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-13 02:51:40,645 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 02:51:42,577 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7000, loss[loss=0.1098, beats_loss=0.01247, ecapa_loss=0.0001868, whisper_loss=0.09543, over 16679.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01091, ecapa_loss=0.0001685, whisper_loss=0.09252, over 3876395.48 frames. ], batch size: 70, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:52:02,051 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 36 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 02:52:17,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1954090.0, ans=0.0 2024-08-13 02:52:32,713 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 36 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 02:52:34,242 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 02:52:37,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1954190.0, ans=0.125 2024-08-13 02:52:47,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1954290.0, ans=0.95 2024-08-13 02:52:48,672 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.444e+01 2.710e+01 2.918e+01 4.538e+01, threshold=5.419e+01, percent-clipped=0.0 2024-08-13 02:53:04,591 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 02:53:04,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1954390.0, ans=0.0 2024-08-13 02:53:07,911 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 02:53:11,414 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.554e+01 2024-08-13 02:53:16,208 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7050, loss[loss=0.1012, beats_loss=0.01087, ecapa_loss=0.0001564, whisper_loss=0.0888, over 20185.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01091, ecapa_loss=0.0001691, whisper_loss=0.09271, over 3893296.94 frames. ], batch size: 80, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:53:34,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1954590.0, ans=0.1 2024-08-13 02:53:44,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1954590.0, ans=0.1 2024-08-13 02:53:46,513 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 02:53:55,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1954690.0, ans=10.0 2024-08-13 02:53:56,353 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2024-08-13 02:54:01,273 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 02:54:03,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.29 vs. limit=6.0 2024-08-13 02:54:08,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1954690.0, ans=0.2 2024-08-13 02:54:19,597 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 02:54:25,084 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 02:54:48,368 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7100, loss[loss=0.09194, beats_loss=0.01103, ecapa_loss=0.000177, whisper_loss=0.07914, over 13811.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01098, ecapa_loss=0.0001676, whisper_loss=0.09202, over 3877500.88 frames. ], batch size: 56, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:55:13,751 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 18 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 02:55:20,800 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-13 02:55:34,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1955190.0, ans=0.125 2024-08-13 02:55:35,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1955190.0, ans=0.125 2024-08-13 02:55:45,551 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 30 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 02:55:48,666 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.18 vs. limit=6.0 2024-08-13 02:55:49,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1955290.0, ans=0.035 2024-08-13 02:55:52,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.441e+01 2.743e+01 3.182e+01 6.176e+01, threshold=5.486e+01, percent-clipped=2.0 2024-08-13 02:56:06,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1955390.0, ans=0.0 2024-08-13 02:56:14,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1955390.0, ans=0.125 2024-08-13 02:56:17,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1955390.0, ans=0.125 2024-08-13 02:56:20,293 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7150, loss[loss=0.08602, beats_loss=0.01204, ecapa_loss=0.0002034, whisper_loss=0.07194, over 15182.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01098, ecapa_loss=0.0001687, whisper_loss=0.09199, over 3896658.24 frames. ], batch size: 64, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:56:41,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1955590.0, ans=0.1 2024-08-13 02:57:04,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1955690.0, ans=0.0 2024-08-13 02:57:29,865 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 02:57:35,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1955890.0, ans=0.125 2024-08-13 02:57:53,413 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7200, loss[loss=0.09679, beats_loss=0.01177, ecapa_loss=0.0001356, whisper_loss=0.08367, over 18021.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01102, ecapa_loss=0.0001675, whisper_loss=0.09153, over 3904498.86 frames. ], batch size: 70, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:58:23,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1956090.0, ans=0.2 2024-08-13 02:58:49,601 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-13 02:58:51,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1956290.0, ans=0.0 2024-08-13 02:58:56,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.408e+01 2.678e+01 2.996e+01 6.633e+01, threshold=5.357e+01, percent-clipped=2.0 2024-08-13 02:59:19,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.69 vs. limit=15.0 2024-08-13 02:59:23,715 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7250, loss[loss=0.1119, beats_loss=0.0118, ecapa_loss=0.0001561, whisper_loss=0.09852, over 16934.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01106, ecapa_loss=0.0001684, whisper_loss=0.09101, over 3909747.48 frames. ], batch size: 67, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:59:53,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1956590.0, ans=0.2 2024-08-13 03:00:10,599 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-13 03:00:16,961 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 03:00:19,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1956790.0, ans=0.125 2024-08-13 03:00:53,103 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7300, loss[loss=0.09635, beats_loss=0.008784, ecapa_loss=0.0001988, whisper_loss=0.08558, over 20961.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01102, ecapa_loss=0.0001678, whisper_loss=0.09136, over 3939239.76 frames. ], batch size: 84, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:00:54,990 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 03:01:04,599 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.26 vs. limit=15.0 2024-08-13 03:01:11,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1957090.0, ans=0.0 2024-08-13 03:01:36,852 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2024-08-13 03:01:36,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-13 03:01:37,643 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 03:01:44,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1957190.0, ans=0.125 2024-08-13 03:01:46,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1957290.0, ans=0.125 2024-08-13 03:01:55,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.448e+01 2.774e+01 3.121e+01 5.439e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-13 03:01:56,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1957290.0, ans=0.2 2024-08-13 03:02:20,998 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7350, loss[loss=0.1155, beats_loss=0.007922, ecapa_loss=0.0001902, whisper_loss=0.1057, over 17434.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01098, ecapa_loss=0.0001675, whisper_loss=0.09114, over 3901903.49 frames. ], batch size: 70, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:02:22,160 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-08-13 03:02:28,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1957490.0, ans=0.125 2024-08-13 03:02:49,810 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 03:03:06,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1957690.0, ans=0.125 2024-08-13 03:03:10,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1957690.0, ans=0.1 2024-08-13 03:03:17,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1957790.0, ans=0.125 2024-08-13 03:03:39,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1957890.0, ans=0.0 2024-08-13 03:03:44,555 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.32 vs. limit=10.0 2024-08-13 03:03:45,631 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7400, loss[loss=0.1124, beats_loss=0.008866, ecapa_loss=0.0002453, whisper_loss=0.1011, over 20988.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01094, ecapa_loss=0.0001682, whisper_loss=0.09151, over 3883737.06 frames. ], batch size: 90, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:03:45,777 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 03:03:54,692 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.738e-01 2024-08-13 03:03:56,618 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.52 vs. limit=22.5 2024-08-13 03:04:02,302 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 03:04:10,776 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 03:04:17,293 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 03:04:20,416 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 03:04:20,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1958190.0, ans=0.125 2024-08-13 03:04:23,585 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 03:04:43,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-13 03:04:44,000 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.515e+01 2.775e+01 3.372e+01 5.725e+01, threshold=5.550e+01, percent-clipped=1.0 2024-08-13 03:05:08,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1958490.0, ans=0.125 2024-08-13 03:05:09,115 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7450, loss[loss=0.1087, beats_loss=0.0105, ecapa_loss=0.000184, whisper_loss=0.09632, over 20898.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01089, ecapa_loss=0.0001687, whisper_loss=0.09164, over 3895845.43 frames. ], batch size: 84, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:05:11,213 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 03:05:41,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1958690.0, ans=0.125 2024-08-13 03:05:54,135 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2024-08-13 03:05:55,683 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.10 vs. limit=15.0 2024-08-13 03:06:05,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1958790.0, ans=0.0 2024-08-13 03:06:05,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1958790.0, ans=0.2 2024-08-13 03:06:31,499 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7500, loss[loss=0.1094, beats_loss=0.01062, ecapa_loss=0.0001839, whisper_loss=0.09696, over 22910.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01073, ecapa_loss=0.0001705, whisper_loss=0.09256, over 3879893.52 frames. ], batch size: 91, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:06:34,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1958990.0, ans=0.2 2024-08-13 03:06:49,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1959090.0, ans=0.0 2024-08-13 03:06:53,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1959090.0, ans=0.025 2024-08-13 03:07:08,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1959190.0, ans=0.125 2024-08-13 03:07:10,873 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2024-08-13 03:07:18,907 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 03:07:24,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1959290.0, ans=0.1 2024-08-13 03:07:28,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.459e+01 2.697e+01 3.000e+01 4.880e+01, threshold=5.394e+01, percent-clipped=0.0 2024-08-13 03:07:29,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1959290.0, ans=0.125 2024-08-13 03:07:52,929 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7550, loss[loss=0.08447, beats_loss=0.01025, ecapa_loss=0.0001925, whisper_loss=0.07229, over 12994.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01076, ecapa_loss=0.0001697, whisper_loss=0.09275, over 3867788.92 frames. ], batch size: 54, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:08:08,626 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2024-08-13 03:08:11,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1959590.0, ans=0.0 2024-08-13 03:08:27,844 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 03:08:28,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1959690.0, ans=0.0 2024-08-13 03:08:50,120 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-13 03:08:51,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1959790.0, ans=0.125 2024-08-13 03:08:51,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1959790.0, ans=0.125 2024-08-13 03:08:57,021 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 14 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 03:09:11,666 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7600, loss[loss=0.08093, beats_loss=0.01534, ecapa_loss=0.0001884, whisper_loss=0.06371, over 21022.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0108, ecapa_loss=0.0001694, whisper_loss=0.09195, over 3828884.56 frames. ], batch size: 92, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:09:35,712 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-13 03:09:58,540 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 03:10:08,258 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.552e+01 2.815e+01 3.111e+01 1.865e+02, threshold=5.629e+01, percent-clipped=3.0 2024-08-13 03:10:10,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.23 vs. limit=10.0 2024-08-13 03:10:11,356 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 03:10:14,299 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 33 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 03:10:32,103 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7650, loss[loss=0.09588, beats_loss=0.01065, ecapa_loss=0.0001554, whisper_loss=0.08367, over 17759.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01076, ecapa_loss=0.0001691, whisper_loss=0.09226, over 3862003.91 frames. ], batch size: 67, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:10:33,452 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 03:10:37,063 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 03:10:42,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1960490.0, ans=0.125 2024-08-13 03:10:55,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1960590.0, ans=0.125 2024-08-13 03:10:57,198 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-08-13 03:11:07,487 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2024-08-13 03:11:17,497 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-08-13 03:11:43,276 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 03:11:50,481 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7700, loss[loss=0.1105, beats_loss=0.01054, ecapa_loss=0.0001959, whisper_loss=0.09799, over 21900.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01077, ecapa_loss=0.0001687, whisper_loss=0.09219, over 3869747.74 frames. ], batch size: 93, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:11:57,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1960990.0, ans=0.125 2024-08-13 03:12:11,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1961090.0, ans=0.2 2024-08-13 03:12:16,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1961090.0, ans=0.1 2024-08-13 03:12:19,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1961090.0, ans=0.125 2024-08-13 03:12:24,034 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2024-08-13 03:12:33,631 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=15.0 2024-08-13 03:12:40,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1961290.0, ans=0.0 2024-08-13 03:12:44,233 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.500e+01 2.815e+01 3.285e+01 6.862e+01, threshold=5.629e+01, percent-clipped=1.0 2024-08-13 03:12:55,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1961390.0, ans=0.125 2024-08-13 03:13:08,361 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7750, loss[loss=0.1235, beats_loss=0.009991, ecapa_loss=0.0001686, whisper_loss=0.1118, over 20529.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01081, ecapa_loss=0.000167, whisper_loss=0.09186, over 3883404.89 frames. ], batch size: 80, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:13:28,917 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 03:13:43,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1961690.0, ans=0.0 2024-08-13 03:13:47,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1961690.0, ans=0.125 2024-08-13 03:13:49,246 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2024-08-13 03:14:01,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1961790.0, ans=0.125 2024-08-13 03:14:02,530 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 22 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-13 03:14:04,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1961790.0, ans=0.0 2024-08-13 03:14:04,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1961790.0, ans=10.0 2024-08-13 03:14:12,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1961890.0, ans=0.125 2024-08-13 03:14:15,885 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 15 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 03:14:20,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1961890.0, ans=0.1 2024-08-13 03:14:24,487 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 03:14:25,599 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7800, loss[loss=0.1172, beats_loss=0.01077, ecapa_loss=0.0001452, whisper_loss=0.1049, over 20948.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0109, ecapa_loss=0.000166, whisper_loss=0.09114, over 3885735.99 frames. ], batch size: 84, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:14:28,952 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 26 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-13 03:14:32,638 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.610e+00 2024-08-13 03:14:44,485 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 03:14:49,434 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 03:15:04,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1962190.0, ans=0.125 2024-08-13 03:15:09,278 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 03:15:19,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.419e+01 2.661e+01 3.121e+01 6.090e+01, threshold=5.321e+01, percent-clipped=1.0 2024-08-13 03:15:29,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1962390.0, ans=0.1 2024-08-13 03:15:34,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1962390.0, ans=0.2 2024-08-13 03:15:43,422 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7850, loss[loss=0.1238, beats_loss=0.009137, ecapa_loss=0.0002068, whisper_loss=0.1126, over 17888.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01094, ecapa_loss=0.0001666, whisper_loss=0.09179, over 3888710.12 frames. ], batch size: 73, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:15:48,543 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-13 03:15:54,744 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-13 03:15:59,823 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-13 03:16:02,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1962590.0, ans=0.0 2024-08-13 03:16:14,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1962690.0, ans=0.125 2024-08-13 03:16:26,337 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=12.0 2024-08-13 03:16:30,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1962790.0, ans=10.0 2024-08-13 03:16:43,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2024-08-13 03:17:00,006 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7900, loss[loss=0.1138, beats_loss=0.01185, ecapa_loss=0.0001333, whisper_loss=0.1006, over 23209.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01097, ecapa_loss=0.0001664, whisper_loss=0.09221, over 3881543.08 frames. ], batch size: 88, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:17:01,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1962990.0, ans=0.0 2024-08-13 03:17:29,424 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 03:17:44,158 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 12 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 03:17:52,577 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.438e+01 2.739e+01 3.083e+01 5.244e+01, threshold=5.477e+01, percent-clipped=0.0 2024-08-13 03:18:04,293 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 03:18:07,815 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=12.0 2024-08-13 03:18:09,124 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=12.0 2024-08-13 03:18:14,313 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 7950, loss[loss=0.1099, beats_loss=0.01028, ecapa_loss=0.0001564, whisper_loss=0.09804, over 22745.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01097, ecapa_loss=0.0001673, whisper_loss=0.09132, over 3893209.15 frames. ], batch size: 89, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:18:14,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1963490.0, ans=0.07 2024-08-13 03:18:30,008 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 03:18:33,626 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.55 vs. limit=12.0 2024-08-13 03:18:34,295 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 03:18:37,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1963590.0, ans=0.1 2024-08-13 03:18:38,143 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 03:18:39,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1963590.0, ans=0.1 2024-08-13 03:18:43,807 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 03:18:53,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1963690.0, ans=0.2 2024-08-13 03:19:00,557 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 03:19:06,944 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 03:19:13,724 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 03:19:19,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1963890.0, ans=0.0 2024-08-13 03:19:28,753 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8000, loss[loss=0.1099, beats_loss=0.01046, ecapa_loss=0.0001322, whisper_loss=0.09813, over 17770.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01098, ecapa_loss=0.0001665, whisper_loss=0.09128, over 3881697.41 frames. ], batch size: 67, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:19:44,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1964090.0, ans=0.2 2024-08-13 03:19:54,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1964090.0, ans=0.1 2024-08-13 03:20:01,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1964190.0, ans=0.1 2024-08-13 03:20:12,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1964290.0, ans=0.0 2024-08-13 03:20:21,308 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.304e+01 2.712e+01 2.987e+01 5.432e+01, threshold=5.425e+01, percent-clipped=0.0 2024-08-13 03:20:32,378 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 03:20:42,381 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8050, loss[loss=0.1083, beats_loss=0.0121, ecapa_loss=0.0001462, whisper_loss=0.0947, over 23750.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.000167, whisper_loss=0.09154, over 3856072.19 frames. ], batch size: 93, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:21:05,574 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-13 03:21:15,106 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-13 03:21:16,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1964690.0, ans=15.0 2024-08-13 03:21:26,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1964790.0, ans=0.2 2024-08-13 03:21:27,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1964790.0, ans=0.0 2024-08-13 03:21:41,789 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 03:21:42,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1964890.0, ans=0.1 2024-08-13 03:21:46,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1964890.0, ans=0.125 2024-08-13 03:21:51,937 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8100, loss[loss=0.104, beats_loss=0.01168, ecapa_loss=0.000187, whisper_loss=0.09049, over 22656.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.011, ecapa_loss=0.0001669, whisper_loss=0.09053, over 3866269.45 frames. ], batch size: 91, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:22:04,800 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-13 03:22:15,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1965090.0, ans=0.2 2024-08-13 03:22:20,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1965190.0, ans=0.2 2024-08-13 03:22:22,820 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 03:22:34,920 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-13 03:22:39,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.433e+01 2.725e+01 3.019e+01 1.220e+02, threshold=5.449e+01, percent-clipped=1.0 2024-08-13 03:22:49,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1965390.0, ans=0.125 2024-08-13 03:22:58,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1965390.0, ans=0.125 2024-08-13 03:23:01,261 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8150, loss[loss=0.1099, beats_loss=0.009981, ecapa_loss=0.0002067, whisper_loss=0.09787, over 18856.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01096, ecapa_loss=0.0001669, whisper_loss=0.09104, over 3871210.71 frames. ], batch size: 77, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:23:11,338 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 15 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-13 03:23:19,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1965590.0, ans=0.1 2024-08-13 03:23:27,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1965690.0, ans=0.125 2024-08-13 03:23:27,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1965690.0, ans=0.0 2024-08-13 03:23:33,247 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 03:23:58,210 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 20 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-13 03:24:01,185 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.759e+01 2024-08-13 03:24:02,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1965890.0, ans=0.0 2024-08-13 03:24:10,433 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8200, loss[loss=0.1047, beats_loss=0.009747, ecapa_loss=0.0002335, whisper_loss=0.09258, over 20942.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01093, ecapa_loss=0.0001676, whisper_loss=0.09142, over 3908295.70 frames. ], batch size: 89, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:24:13,542 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-13 03:24:17,812 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 03:24:27,266 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 03:24:31,504 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 30 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 03:24:45,969 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 03:24:53,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1966290.0, ans=0.125 2024-08-13 03:24:56,731 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2024-08-13 03:24:57,468 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 03:24:58,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.561e+01 2.768e+01 3.091e+01 7.365e+01, threshold=5.537e+01, percent-clipped=2.0 2024-08-13 03:25:19,155 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8250, loss[loss=0.07154, beats_loss=0.0137, ecapa_loss=0.0001645, whisper_loss=0.05619, over 15433.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01089, ecapa_loss=0.0001685, whisper_loss=0.09151, over 3909793.55 frames. ], batch size: 62, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:25:28,609 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 03:25:31,188 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 27 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-13 03:25:35,137 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-13 03:25:39,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1966590.0, ans=0.0 2024-08-13 03:25:43,194 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 19 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-13 03:25:48,533 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 40 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 03:25:51,646 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.78 vs. limit=22.5 2024-08-13 03:25:53,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1966690.0, ans=0.0 2024-08-13 03:26:07,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1966790.0, ans=0.125 2024-08-13 03:26:08,573 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 03:26:11,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1966890.0, ans=0.125 2024-08-13 03:26:25,398 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8300, loss[loss=0.09504, beats_loss=0.01112, ecapa_loss=0.0001674, whisper_loss=0.08225, over 21790.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01091, ecapa_loss=0.0001675, whisper_loss=0.09127, over 3877276.30 frames. ], batch size: 90, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:26:25,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1966990.0, ans=0.125 2024-08-13 03:26:28,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2024-08-13 03:26:33,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1966990.0, ans=0.125 2024-08-13 03:26:45,576 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 03:26:45,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1967090.0, ans=0.1 2024-08-13 03:26:49,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1967090.0, ans=0.5 2024-08-13 03:26:55,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1967190.0, ans=0.1 2024-08-13 03:27:12,599 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-08-13 03:27:12,898 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.397e+01 2.699e+01 2.951e+01 6.635e+01, threshold=5.397e+01, percent-clipped=2.0 2024-08-13 03:27:20,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1967390.0, ans=0.125 2024-08-13 03:27:21,316 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 03:27:28,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1967390.0, ans=0.125 2024-08-13 03:27:29,223 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 03:27:31,416 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2024-08-13 03:27:33,323 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8350, loss[loss=0.09416, beats_loss=0.01191, ecapa_loss=9.595e-05, whisper_loss=0.08129, over 18474.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01097, ecapa_loss=0.0001686, whisper_loss=0.09142, over 3893325.74 frames. ], batch size: 68, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:28:14,472 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 03:28:14,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1967790.0, ans=10.0 2024-08-13 03:28:14,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1967790.0, ans=0.125 2024-08-13 03:28:34,907 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-13 03:28:38,357 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 03:28:38,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1967890.0, ans=0.1 2024-08-13 03:28:42,360 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8400, loss[loss=0.1336, beats_loss=0.006866, ecapa_loss=0.0001679, whisper_loss=0.1251, over 19341.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01089, ecapa_loss=0.0001693, whisper_loss=0.09102, over 3901104.89 frames. ], batch size: 70, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:28:43,953 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 03:29:20,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1968190.0, ans=0.2 2024-08-13 03:29:30,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.441e+01 2.759e+01 3.099e+01 1.310e+02, threshold=5.518e+01, percent-clipped=1.0 2024-08-13 03:29:34,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1968290.0, ans=0.125 2024-08-13 03:29:37,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1968390.0, ans=0.1 2024-08-13 03:29:41,050 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-13 03:29:44,342 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.32 vs. limit=22.5 2024-08-13 03:29:51,591 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8450, loss[loss=0.1009, beats_loss=0.008842, ecapa_loss=0.0001778, whisper_loss=0.09023, over 21932.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01083, ecapa_loss=0.0001698, whisper_loss=0.09105, over 3879003.93 frames. ], batch size: 86, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:29:54,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1968490.0, ans=0.09899494936611666 2024-08-13 03:30:04,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1968590.0, ans=0.0 2024-08-13 03:30:11,873 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=12.0 2024-08-13 03:30:26,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1968690.0, ans=0.125 2024-08-13 03:30:59,953 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8500, loss[loss=0.09693, beats_loss=0.01165, ecapa_loss=0.0001969, whisper_loss=0.08331, over 22099.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01092, ecapa_loss=0.000169, whisper_loss=0.09015, over 3857601.32 frames. ], batch size: 90, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:31:22,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1969090.0, ans=0.0 2024-08-13 03:31:23,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1969090.0, ans=0.0 2024-08-13 03:31:30,814 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 03:31:45,760 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 03:31:47,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1969290.0, ans=0.0 2024-08-13 03:31:48,056 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.417e+01 2.734e+01 3.054e+01 8.886e+01, threshold=5.467e+01, percent-clipped=1.0 2024-08-13 03:31:50,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1969290.0, ans=0.125 2024-08-13 03:31:55,294 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 27 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 03:31:56,468 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 03:32:01,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1969390.0, ans=0.125 2024-08-13 03:32:08,470 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8550, loss[loss=0.1066, beats_loss=0.01203, ecapa_loss=0.0001875, whisper_loss=0.09265, over 21829.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01087, ecapa_loss=0.0001691, whisper_loss=0.09099, over 3872167.88 frames. ], batch size: 91, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:32:23,116 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=15.50 vs. limit=15.0 2024-08-13 03:32:26,641 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-13 03:32:50,978 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 03:33:16,985 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8600, loss[loss=0.1037, beats_loss=0.01049, ecapa_loss=0.0001655, whisper_loss=0.0916, over 22343.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.0001684, whisper_loss=0.09162, over 3905607.00 frames. ], batch size: 89, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:33:29,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.08 vs. limit=22.5 2024-08-13 03:33:31,176 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-13 03:33:37,117 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 03:33:40,036 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 03:34:06,496 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2024-08-13 03:34:06,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.434e+01 2.755e+01 2.994e+01 8.345e+01, threshold=5.511e+01, percent-clipped=1.0 2024-08-13 03:34:23,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1970390.0, ans=0.0 2024-08-13 03:34:28,468 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8650, loss[loss=0.1159, beats_loss=0.01109, ecapa_loss=0.0001752, whisper_loss=0.103, over 22731.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01095, ecapa_loss=0.0001694, whisper_loss=0.09128, over 3877999.55 frames. ], batch size: 90, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:34:35,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1970490.0, ans=0.125 2024-08-13 03:34:50,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1970590.0, ans=0.125 2024-08-13 03:34:53,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1970590.0, ans=0.125 2024-08-13 03:35:00,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1970690.0, ans=0.2 2024-08-13 03:35:00,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1970690.0, ans=0.1 2024-08-13 03:35:44,095 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8700, loss[loss=0.1105, beats_loss=0.009637, ecapa_loss=0.0001774, whisper_loss=0.09906, over 17803.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01101, ecapa_loss=0.0001703, whisper_loss=0.09089, over 3874966.36 frames. ], batch size: 70, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:36:09,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1971090.0, ans=0.125 2024-08-13 03:36:09,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1971090.0, ans=0.125 2024-08-13 03:36:40,234 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.522e+01 2.761e+01 3.315e+01 1.069e+02, threshold=5.521e+01, percent-clipped=2.0 2024-08-13 03:36:43,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-13 03:36:54,752 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 29 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 03:37:05,253 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8750, loss[loss=0.05723, beats_loss=0.0113, ecapa_loss=0.0001943, whisper_loss=0.04398, over 13160.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01092, ecapa_loss=0.0001709, whisper_loss=0.09092, over 3838623.26 frames. ], batch size: 54, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:37:10,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1971490.0, ans=0.125 2024-08-13 03:37:15,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1971490.0, ans=0.1 2024-08-13 03:37:17,176 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 24 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 03:37:44,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1971690.0, ans=0.2 2024-08-13 03:38:24,581 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8800, loss[loss=0.1139, beats_loss=0.008643, ecapa_loss=0.0001651, whisper_loss=0.1036, over 18046.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01093, ecapa_loss=0.0001695, whisper_loss=0.09114, over 3871257.40 frames. ], batch size: 71, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:38:41,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1972090.0, ans=0.0 2024-08-13 03:39:03,548 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.17 vs. limit=15.0 2024-08-13 03:39:23,246 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.400e+01 2.713e+01 2.983e+01 4.963e+01, threshold=5.426e+01, percent-clipped=0.0 2024-08-13 03:39:24,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1972290.0, ans=0.125 2024-08-13 03:39:27,942 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 03:39:46,407 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8850, loss[loss=0.105, beats_loss=0.008818, ecapa_loss=0.000201, whisper_loss=0.09413, over 17073.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01103, ecapa_loss=0.0001685, whisper_loss=0.08996, over 3857910.04 frames. ], batch size: 69, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:39:46,616 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 03:39:48,627 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 03:40:09,266 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 03:40:13,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1972590.0, ans=0.0 2024-08-13 03:40:21,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1972690.0, ans=0.04949747468305833 2024-08-13 03:40:27,549 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-13 03:40:34,933 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.89 vs. limit=22.5 2024-08-13 03:40:44,052 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 03:41:06,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1972890.0, ans=0.125 2024-08-13 03:41:08,074 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8900, loss[loss=0.1051, beats_loss=0.01218, ecapa_loss=0.0001664, whisper_loss=0.09122, over 20559.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01113, ecapa_loss=0.000167, whisper_loss=0.08973, over 3853629.89 frames. ], batch size: 85, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:41:26,352 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-08-13 03:41:40,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1973190.0, ans=0.125 2024-08-13 03:41:59,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1973290.0, ans=0.125 2024-08-13 03:42:01,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1973290.0, ans=0.1 2024-08-13 03:42:01,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.80 vs. limit=10.0 2024-08-13 03:42:05,593 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.456e+01 2.768e+01 3.242e+01 5.170e+01, threshold=5.536e+01, percent-clipped=0.0 2024-08-13 03:42:06,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1973290.0, ans=0.2 2024-08-13 03:42:12,527 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 03:42:17,717 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 03:42:29,729 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 8950, loss[loss=0.1088, beats_loss=0.00992, ecapa_loss=0.0001755, whisper_loss=0.09716, over 17907.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01106, ecapa_loss=0.0001671, whisper_loss=0.09013, over 3860503.63 frames. ], batch size: 68, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:42:34,802 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-13 03:42:40,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1973490.0, ans=0.125 2024-08-13 03:42:51,271 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.12 vs. limit=10.0 2024-08-13 03:42:51,329 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-08-13 03:43:07,193 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 03:43:07,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1973690.0, ans=0.0 2024-08-13 03:43:10,345 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 03:43:27,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1973790.0, ans=0.05 2024-08-13 03:43:33,011 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-13 03:43:47,946 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9000, loss[loss=0.09431, beats_loss=0.01363, ecapa_loss=0.0001453, whisper_loss=0.07924, over 17246.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01097, ecapa_loss=0.0001675, whisper_loss=0.09051, over 3855609.27 frames. ], batch size: 70, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:43:47,946 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 03:44:01,775 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.8981, 3.1298, 3.9960, 4.3124], device='cuda:1') 2024-08-13 03:44:28,222 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005752, whisper_loss=0.2484, over 922467.00 frames. 2024-08-13 03:44:46,283 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on SV_voxceleb1: loss=0.004584, beats_loss=0, ecapa_loss=0.0004584, whisper_loss=0, over 939242.00 frames. 2024-08-13 03:46:42,089 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on AT_audioset: loss=0.02386, beats_loss=0.02386, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 03:46:42,093 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-13 03:47:10,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1974090.0, ans=0.0 2024-08-13 03:47:38,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1974290.0, ans=0.2 2024-08-13 03:47:39,062 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 03:47:41,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.559e+01 2.793e+01 3.240e+01 5.167e+01, threshold=5.585e+01, percent-clipped=0.0 2024-08-13 03:47:58,407 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.708e+00 2024-08-13 03:47:59,352 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 03:48:07,093 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9050, loss[loss=0.08357, beats_loss=0.01223, ecapa_loss=0.0001764, whisper_loss=0.06958, over 22479.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01091, ecapa_loss=0.0001673, whisper_loss=0.09116, over 3871503.97 frames. ], batch size: 96, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:48:13,387 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 29 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 03:48:22,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1974590.0, ans=0.2 2024-08-13 03:48:22,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1974590.0, ans=0.125 2024-08-13 03:49:00,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1974790.0, ans=0.125 2024-08-13 03:49:07,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1974790.0, ans=0.125 2024-08-13 03:49:13,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1974890.0, ans=0.125 2024-08-13 03:49:17,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1974890.0, ans=0.1 2024-08-13 03:49:28,249 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9100, loss[loss=0.09391, beats_loss=0.01333, ecapa_loss=0.0001229, whisper_loss=0.07935, over 22675.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01092, ecapa_loss=0.0001682, whisper_loss=0.0911, over 3883695.19 frames. ], batch size: 93, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:49:32,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1974990.0, ans=0.0 2024-08-13 03:49:32,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1974990.0, ans=0.0 2024-08-13 03:49:44,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1975090.0, ans=0.1 2024-08-13 03:49:45,861 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 21 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-13 03:50:02,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1975190.0, ans=0.125 2024-08-13 03:50:19,519 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-13 03:50:26,267 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.465e+01 2.794e+01 3.182e+01 5.687e+01, threshold=5.588e+01, percent-clipped=1.0 2024-08-13 03:50:42,437 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-13 03:50:51,243 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 14 from Vox, 53 fro AS 2024-08-13 03:50:52,135 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9150, loss[loss=0.09799, beats_loss=0.01477, ecapa_loss=0.0001097, whisper_loss=0.08211, over 23143.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01104, ecapa_loss=0.0001655, whisper_loss=0.09084, over 3891429.78 frames. ], batch size: 92, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:51:06,420 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 03:51:16,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1975590.0, ans=0.0 2024-08-13 03:51:58,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1975890.0, ans=0.125 2024-08-13 03:52:00,212 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 03:52:03,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1975890.0, ans=0.0 2024-08-13 03:52:13,390 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9200, loss[loss=0.09924, beats_loss=0.01227, ecapa_loss=0.0001634, whisper_loss=0.08534, over 21844.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01098, ecapa_loss=0.0001665, whisper_loss=0.09146, over 3891318.91 frames. ], batch size: 92, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:52:15,790 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 03:52:27,793 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 30 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 03:52:35,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1976090.0, ans=0.05 2024-08-13 03:52:48,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1976190.0, ans=0.025 2024-08-13 03:53:03,201 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-13 03:53:11,032 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.444e+01 2.723e+01 3.266e+01 6.783e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-13 03:53:11,277 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-13 03:53:24,494 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 03:53:32,661 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9250, loss[loss=0.09787, beats_loss=0.01013, ecapa_loss=0.0001435, whisper_loss=0.08631, over 16031.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01095, ecapa_loss=0.000167, whisper_loss=0.09101, over 3866306.06 frames. ], batch size: 61, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:53:58,518 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 16 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 03:54:01,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1976590.0, ans=0.125 2024-08-13 03:54:05,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1976690.0, ans=0.0 2024-08-13 03:54:12,168 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 23 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 03:54:17,674 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2024-08-13 03:54:26,716 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 30 from LS+wenet, 14 from Vox, 16 fro AS 2024-08-13 03:54:26,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1976790.0, ans=0.125 2024-08-13 03:54:40,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1976890.0, ans=0.2 2024-08-13 03:54:42,578 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.65 vs. limit=15.0 2024-08-13 03:54:47,114 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.84 vs. limit=15.0 2024-08-13 03:54:54,731 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9300, loss[loss=0.1237, beats_loss=0.00943, ecapa_loss=0.0001546, whisper_loss=0.1127, over 18091.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0109, ecapa_loss=0.0001678, whisper_loss=0.09067, over 3844203.69 frames. ], batch size: 67, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:54:56,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1976990.0, ans=0.1 2024-08-13 03:55:11,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1977090.0, ans=0.1 2024-08-13 03:55:13,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1977090.0, ans=0.025 2024-08-13 03:55:22,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1977090.0, ans=0.1 2024-08-13 03:55:24,774 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-13 03:55:25,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1977090.0, ans=0.125 2024-08-13 03:55:26,906 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.57 vs. limit=10.0 2024-08-13 03:55:35,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1977190.0, ans=0.0 2024-08-13 03:55:41,103 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 03:55:49,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1977290.0, ans=0.125 2024-08-13 03:55:52,762 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=12.0 2024-08-13 03:55:55,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.460e+01 2.642e+01 2.957e+01 1.771e+02, threshold=5.283e+01, percent-clipped=2.0 2024-08-13 03:55:57,871 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 03:56:02,293 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 03:56:15,022 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.17 vs. limit=10.0 2024-08-13 03:56:18,757 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9350, loss[loss=0.09819, beats_loss=0.01124, ecapa_loss=0.0001983, whisper_loss=0.08496, over 17043.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01089, ecapa_loss=0.0001672, whisper_loss=0.09068, over 3836267.19 frames. ], batch size: 69, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:56:33,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1977590.0, ans=0.035 2024-08-13 03:56:43,415 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2024-08-13 03:56:56,953 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-13 03:57:00,363 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 13 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 03:57:22,762 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 03:57:27,289 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 03:57:38,655 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9400, loss[loss=0.1334, beats_loss=0.008966, ecapa_loss=0.000168, whisper_loss=0.1228, over 24252.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01092, ecapa_loss=0.0001665, whisper_loss=0.09091, over 3856367.14 frames. ], batch size: 93, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:57:45,272 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-13 03:57:53,202 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 03:57:53,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1978090.0, ans=0.1 2024-08-13 03:57:55,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1978090.0, ans=0.125 2024-08-13 03:57:56,713 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 22 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-13 03:58:09,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1978190.0, ans=0.035 2024-08-13 03:58:22,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1978190.0, ans=0.2 2024-08-13 03:58:23,425 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 03:58:27,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1978290.0, ans=0.0 2024-08-13 03:58:29,667 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 03:58:31,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1978290.0, ans=0.125 2024-08-13 03:58:34,862 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.425e+01 2.641e+01 3.063e+01 7.732e+01, threshold=5.282e+01, percent-clipped=1.0 2024-08-13 03:58:57,134 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9450, loss[loss=0.1133, beats_loss=0.01259, ecapa_loss=0.0001462, whisper_loss=0.09923, over 21907.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01093, ecapa_loss=0.0001681, whisper_loss=0.0907, over 3846600.16 frames. ], batch size: 87, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:59:25,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1978590.0, ans=0.0 2024-08-13 03:59:33,993 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 03:59:44,477 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2024-08-13 04:00:01,803 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.47 vs. limit=22.5 2024-08-13 04:00:07,986 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-13 04:00:10,941 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 04:00:17,079 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9500, loss[loss=0.1108, beats_loss=0.009864, ecapa_loss=0.000163, whisper_loss=0.0993, over 20137.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01092, ecapa_loss=0.0001683, whisper_loss=0.09073, over 3860386.84 frames. ], batch size: 75, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:00:33,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1979090.0, ans=0.1 2024-08-13 04:00:43,822 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 04:00:48,101 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 04:00:53,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1979190.0, ans=0.125 2024-08-13 04:00:59,124 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 04:01:11,208 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 04:01:12,849 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.497e+01 2.737e+01 3.144e+01 1.195e+02, threshold=5.474e+01, percent-clipped=3.0 2024-08-13 04:01:21,633 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 04:01:33,975 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9550, loss[loss=0.1232, beats_loss=0.009747, ecapa_loss=0.0001777, whisper_loss=0.1117, over 18970.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01099, ecapa_loss=0.0001695, whisper_loss=0.09035, over 3871561.17 frames. ], batch size: 74, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:01:43,980 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2024-08-13 04:02:01,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1979590.0, ans=0.125 2024-08-13 04:02:07,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1979690.0, ans=0.125 2024-08-13 04:02:10,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1979690.0, ans=0.125 2024-08-13 04:02:17,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1979790.0, ans=0.125 2024-08-13 04:02:32,817 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-13 04:02:33,612 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2024-08-13 04:02:38,436 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 25 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-13 04:02:46,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.89 vs. limit=22.5 2024-08-13 04:02:46,648 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9600, loss[loss=0.1016, beats_loss=0.01215, ecapa_loss=0.0001664, whisper_loss=0.08782, over 23645.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01092, ecapa_loss=0.0001697, whisper_loss=0.09067, over 3864350.61 frames. ], batch size: 94, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:02:52,530 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 04:02:58,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1979990.0, ans=0.125 2024-08-13 04:02:58,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1979990.0, ans=0.125 2024-08-13 04:02:58,598 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2024-08-13 04:03:36,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.597e+01 2.785e+01 3.117e+01 4.817e+01, threshold=5.569e+01, percent-clipped=0.0 2024-08-13 04:03:40,613 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 04:03:55,291 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9650, loss[loss=0.1183, beats_loss=0.01036, ecapa_loss=0.0001669, whisper_loss=0.1062, over 19499.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01082, ecapa_loss=0.0001704, whisper_loss=0.09118, over 3856868.53 frames. ], batch size: 76, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:03:57,594 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.00 vs. limit=10.0 2024-08-13 04:03:58,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1980490.0, ans=0.0 2024-08-13 04:04:07,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1980490.0, ans=0.035 2024-08-13 04:04:07,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1980490.0, ans=0.125 2024-08-13 04:04:07,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1980490.0, ans=0.125 2024-08-13 04:04:42,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1980790.0, ans=0.125 2024-08-13 04:04:55,786 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 04:05:03,714 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=12.0 2024-08-13 04:05:05,400 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9700, loss[loss=0.07112, beats_loss=0.01574, ecapa_loss=0.0001386, whisper_loss=0.05399, over 14929.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01086, ecapa_loss=0.0001708, whisper_loss=0.09174, over 3866706.98 frames. ], batch size: 59, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:05:07,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1980990.0, ans=0.0 2024-08-13 04:05:09,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1980990.0, ans=0.0 2024-08-13 04:05:15,714 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-13 04:05:16,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1980990.0, ans=0.0 2024-08-13 04:05:17,961 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 04:05:29,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1981090.0, ans=0.125 2024-08-13 04:05:43,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1981190.0, ans=0.1 2024-08-13 04:05:51,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1981290.0, ans=0.125 2024-08-13 04:05:53,085 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 04:05:55,603 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.457e+01 2.661e+01 2.979e+01 4.854e+01, threshold=5.323e+01, percent-clipped=0.0 2024-08-13 04:05:58,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1981290.0, ans=0.125 2024-08-13 04:06:01,416 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 04:06:01,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1981390.0, ans=0.125 2024-08-13 04:06:06,803 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 04:06:14,732 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9750, loss[loss=0.1234, beats_loss=0.008022, ecapa_loss=0.000179, whisper_loss=0.1135, over 19535.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01086, ecapa_loss=0.0001705, whisper_loss=0.09163, over 3848559.23 frames. ], batch size: 77, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:06:21,837 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-13 04:06:22,330 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-13 04:06:24,558 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 04:06:30,140 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 04:06:36,792 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 04:06:46,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1981690.0, ans=0.125 2024-08-13 04:06:52,166 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 04:06:57,249 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2024-08-13 04:07:24,308 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9800, loss[loss=0.1107, beats_loss=0.01067, ecapa_loss=0.0001553, whisper_loss=0.09852, over 22720.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01081, ecapa_loss=0.0001691, whisper_loss=0.09156, over 3848577.16 frames. ], batch size: 89, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:07:31,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1981990.0, ans=0.0 2024-08-13 04:07:46,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1982090.0, ans=0.5 2024-08-13 04:07:46,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1982090.0, ans=0.95 2024-08-13 04:08:07,185 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 04:08:15,265 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.387e+01 2.562e+01 2.934e+01 4.315e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-13 04:08:16,896 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 21 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 04:08:29,017 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 04:08:30,487 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 04:08:32,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1982390.0, ans=0.125 2024-08-13 04:08:34,306 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9850, loss[loss=0.1048, beats_loss=0.01027, ecapa_loss=0.0002021, whisper_loss=0.09256, over 21579.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01088, ecapa_loss=0.0001687, whisper_loss=0.09175, over 3846289.91 frames. ], batch size: 89, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:08:47,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1982590.0, ans=0.125 2024-08-13 04:08:59,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1982590.0, ans=0.125 2024-08-13 04:09:10,360 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2024-08-13 04:09:34,798 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 04:09:43,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1982990.0, ans=0.1 2024-08-13 04:09:44,021 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9900, loss[loss=0.09271, beats_loss=0.009554, ecapa_loss=0.0001933, whisper_loss=0.08123, over 18230.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01093, ecapa_loss=0.0001689, whisper_loss=0.09142, over 3853012.15 frames. ], batch size: 73, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:09:52,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1982990.0, ans=0.125 2024-08-13 04:09:58,304 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 14 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-13 04:10:07,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1983090.0, ans=0.0 2024-08-13 04:10:10,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1983190.0, ans=0.125 2024-08-13 04:10:14,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-08-13 04:10:26,163 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 22 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-13 04:10:28,862 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 04:10:30,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1983290.0, ans=0.125 2024-08-13 04:10:31,758 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 04:10:34,487 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.489e+01 2.832e+01 3.268e+01 9.650e+01, threshold=5.664e+01, percent-clipped=3.0 2024-08-13 04:10:47,320 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2024-08-13 04:10:53,007 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 9950, loss[loss=0.1199, beats_loss=0.006182, ecapa_loss=0.0001848, whisper_loss=0.1119, over 15170.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01094, ecapa_loss=0.0001686, whisper_loss=0.09141, over 3893877.99 frames. ], batch size: 58, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:10:58,061 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=28.44 vs. limit=22.5 2024-08-13 04:11:04,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1983490.0, ans=0.0 2024-08-13 04:11:08,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1983590.0, ans=0.0 2024-08-13 04:11:22,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1983690.0, ans=0.0 2024-08-13 04:11:32,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1983690.0, ans=0.125 2024-08-13 04:11:32,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1983690.0, ans=0.125 2024-08-13 04:11:40,059 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 04:11:45,491 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 04:11:47,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1983890.0, ans=0.125 2024-08-13 04:12:01,992 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10000, loss[loss=0.095, beats_loss=0.01235, ecapa_loss=0.0001353, whisper_loss=0.0813, over 20319.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01092, ecapa_loss=0.0001693, whisper_loss=0.09146, over 3881852.98 frames. ], batch size: 82, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:12:02,903 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.14 vs. limit=22.5 2024-08-13 04:12:06,662 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 31 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 04:12:17,392 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 04:12:52,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.356e+01 2.631e+01 2.870e+01 5.046e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-13 04:13:11,560 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10050, loss[loss=0.1034, beats_loss=0.009238, ecapa_loss=0.0001952, whisper_loss=0.0922, over 17096.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01084, ecapa_loss=0.0001688, whisper_loss=0.0915, over 3850716.56 frames. ], batch size: 69, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:13:11,782 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-13 04:13:14,533 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 04:13:15,810 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 04:13:18,617 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 04:13:21,317 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 26 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-13 04:13:33,017 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=22.5 2024-08-13 04:13:44,685 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 04:13:51,622 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 04:13:54,137 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-08-13 04:13:59,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1984790.0, ans=0.07 2024-08-13 04:14:03,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1984790.0, ans=0.125 2024-08-13 04:14:03,321 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.870e+00 2024-08-13 04:14:12,484 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 04:14:12,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1984890.0, ans=0.0 2024-08-13 04:14:20,847 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10100, loss[loss=0.104, beats_loss=0.01276, ecapa_loss=0.0001795, whisper_loss=0.08945, over 14564.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01085, ecapa_loss=0.0001692, whisper_loss=0.09117, over 3868889.99 frames. ], batch size: 60, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:14:33,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1985090.0, ans=0.2 2024-08-13 04:14:34,773 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 04:14:44,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1985090.0, ans=0.125 2024-08-13 04:14:47,182 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 04:15:06,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1985290.0, ans=0.125 2024-08-13 04:15:10,116 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.445e+01 2.629e+01 3.089e+01 3.463e+02, threshold=5.257e+01, percent-clipped=1.0 2024-08-13 04:15:17,477 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 22 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-13 04:15:24,248 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 04:15:29,713 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10150, loss[loss=0.1146, beats_loss=0.01149, ecapa_loss=0.0001603, whisper_loss=0.1015, over 21350.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01083, ecapa_loss=0.0001707, whisper_loss=0.09186, over 3885535.29 frames. ], batch size: 82, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:15:35,048 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 04:15:36,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1985490.0, ans=0.125 2024-08-13 04:15:38,561 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.15 vs. limit=10.0 2024-08-13 04:15:40,474 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 04:15:41,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.13 vs. limit=10.0 2024-08-13 04:15:45,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1985590.0, ans=0.0 2024-08-13 04:16:02,924 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 29 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 04:16:29,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1985890.0, ans=0.2 2024-08-13 04:16:29,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1985890.0, ans=0.95 2024-08-13 04:16:30,415 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 04:16:38,151 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10200, loss[loss=0.0908, beats_loss=0.01446, ecapa_loss=0.0001258, whisper_loss=0.07509, over 21396.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01081, ecapa_loss=0.0001709, whisper_loss=0.09181, over 3859642.43 frames. ], batch size: 86, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:17:03,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1986090.0, ans=0.1 2024-08-13 04:17:09,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1986190.0, ans=0.0 2024-08-13 04:17:21,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1986290.0, ans=0.1 2024-08-13 04:17:22,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1986290.0, ans=0.125 2024-08-13 04:17:27,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.440e+01 2.685e+01 3.230e+01 3.990e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-13 04:17:45,910 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 04:17:47,002 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10250, loss[loss=0.09333, beats_loss=0.0108, ecapa_loss=0.0001587, whisper_loss=0.08095, over 17173.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01075, ecapa_loss=0.0001719, whisper_loss=0.09151, over 3847534.09 frames. ], batch size: 68, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:18:00,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-13 04:18:11,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1986590.0, ans=0.125 2024-08-13 04:18:14,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1986690.0, ans=0.0 2024-08-13 04:18:26,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1986790.0, ans=0.125 2024-08-13 04:18:47,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1986890.0, ans=0.125 2024-08-13 04:18:48,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2024-08-13 04:18:55,403 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10300, loss[loss=0.1113, beats_loss=0.01124, ecapa_loss=0.0001469, whisper_loss=0.09856, over 22678.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01078, ecapa_loss=0.0001696, whisper_loss=0.0914, over 3891971.37 frames. ], batch size: 90, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:19:03,383 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=15.0 2024-08-13 04:19:05,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1986990.0, ans=0.05 2024-08-13 04:19:21,421 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-13 04:19:29,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1987190.0, ans=0.125 2024-08-13 04:19:31,571 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2024-08-13 04:19:44,140 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.484e+01 2.743e+01 3.118e+01 4.422e+01, threshold=5.485e+01, percent-clipped=0.0 2024-08-13 04:19:52,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1987390.0, ans=0.025 2024-08-13 04:20:03,139 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10350, loss[loss=0.1016, beats_loss=0.0118, ecapa_loss=0.0001536, whisper_loss=0.0883, over 18022.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01087, ecapa_loss=0.000168, whisper_loss=0.09087, over 3888367.50 frames. ], batch size: 71, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:20:10,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1987490.0, ans=0.125 2024-08-13 04:20:18,356 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 19 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 04:20:25,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1987590.0, ans=0.1 2024-08-13 04:20:27,127 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-13 04:20:33,659 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 04:20:47,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1987790.0, ans=10.0 2024-08-13 04:20:54,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1987790.0, ans=0.125 2024-08-13 04:21:11,999 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10400, loss[loss=0.08047, beats_loss=0.01245, ecapa_loss=0.0001169, whisper_loss=0.06685, over 18447.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01085, ecapa_loss=0.0001677, whisper_loss=0.09056, over 3865165.60 frames. ], batch size: 72, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:21:12,929 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2024-08-13 04:21:13,197 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2024-08-13 04:21:26,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1988090.0, ans=0.05 2024-08-13 04:21:27,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2024-08-13 04:21:28,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1988090.0, ans=0.1 2024-08-13 04:21:40,293 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 16 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-13 04:21:40,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1988190.0, ans=0.1 2024-08-13 04:21:47,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1988190.0, ans=0.125 2024-08-13 04:21:50,157 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 27 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 04:21:51,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1988190.0, ans=0.0 2024-08-13 04:22:01,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.435e+01 2.770e+01 3.094e+01 5.065e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-13 04:22:02,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1988290.0, ans=0.125 2024-08-13 04:22:13,491 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=22.5 2024-08-13 04:22:21,430 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10450, loss[loss=0.09892, beats_loss=0.009845, ecapa_loss=0.0001362, whisper_loss=0.08772, over 14479.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01086, ecapa_loss=0.0001674, whisper_loss=0.08999, over 3842964.40 frames. ], batch size: 55, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:22:33,362 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-08-13 04:22:35,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1988590.0, ans=0.125 2024-08-13 04:22:39,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1988590.0, ans=0.0 2024-08-13 04:22:47,933 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 04:22:54,609 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-13 04:22:58,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1988690.0, ans=0.0 2024-08-13 04:23:01,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1988790.0, ans=0.125 2024-08-13 04:23:04,070 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-13 04:23:09,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1988790.0, ans=0.1 2024-08-13 04:23:26,011 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 04:23:30,144 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10500, loss[loss=0.1279, beats_loss=0.009287, ecapa_loss=0.0001494, whisper_loss=0.1171, over 20110.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01088, ecapa_loss=0.0001685, whisper_loss=0.09042, over 3832447.58 frames. ], batch size: 72, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:23:41,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1988990.0, ans=0.125 2024-08-13 04:24:12,383 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.48 vs. limit=10.0 2024-08-13 04:24:19,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1989290.0, ans=0.125 2024-08-13 04:24:21,497 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.377e+01 2.646e+01 2.972e+01 5.578e+01, threshold=5.291e+01, percent-clipped=1.0 2024-08-13 04:24:21,782 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 04:24:26,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1989390.0, ans=0.125 2024-08-13 04:24:43,042 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10550, loss[loss=0.1167, beats_loss=0.01042, ecapa_loss=0.000151, whisper_loss=0.1048, over 19061.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0109, ecapa_loss=0.0001667, whisper_loss=0.09068, over 3825130.25 frames. ], batch size: 71, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:24:59,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1989590.0, ans=0.1 2024-08-13 04:25:08,116 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.75 vs. limit=12.0 2024-08-13 04:25:09,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1989590.0, ans=0.0 2024-08-13 04:25:13,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1989690.0, ans=0.125 2024-08-13 04:25:28,669 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-13 04:25:34,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1989790.0, ans=0.125 2024-08-13 04:26:00,238 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10600, loss[loss=0.0996, beats_loss=0.009109, ecapa_loss=0.0002001, whisper_loss=0.08849, over 17776.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01081, ecapa_loss=0.000168, whisper_loss=0.09102, over 3853640.96 frames. ], batch size: 73, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:26:27,012 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.77 vs. limit=15.0 2024-08-13 04:26:33,774 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-13 04:26:42,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1990190.0, ans=0.0 2024-08-13 04:26:54,445 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.291e+01 2.645e+01 2.934e+01 5.325e+01, threshold=5.289e+01, percent-clipped=1.0 2024-08-13 04:26:57,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1990290.0, ans=0.125 2024-08-13 04:27:00,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1990390.0, ans=0.0 2024-08-13 04:27:02,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1990390.0, ans=0.1 2024-08-13 04:27:04,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1990390.0, ans=0.07 2024-08-13 04:27:15,545 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10650, loss[loss=0.1087, beats_loss=0.01212, ecapa_loss=0.0001738, whisper_loss=0.09484, over 19582.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01081, ecapa_loss=0.0001678, whisper_loss=0.09142, over 3852128.59 frames. ], batch size: 76, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:27:16,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1990490.0, ans=0.1 2024-08-13 04:27:29,939 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 04:27:39,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1990590.0, ans=0.125 2024-08-13 04:27:48,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1990690.0, ans=0.2 2024-08-13 04:27:53,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1990690.0, ans=0.2 2024-08-13 04:28:08,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=12.0 2024-08-13 04:28:09,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1990790.0, ans=0.05 2024-08-13 04:28:32,717 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-08-13 04:28:35,220 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10700, loss[loss=0.1126, beats_loss=0.01153, ecapa_loss=0.0001134, whisper_loss=0.09997, over 18555.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01084, ecapa_loss=0.0001662, whisper_loss=0.092, over 3860346.53 frames. ], batch size: 70, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:28:37,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1990990.0, ans=0.0 2024-08-13 04:29:13,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1991190.0, ans=0.0 2024-08-13 04:29:15,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1991190.0, ans=0.125 2024-08-13 04:29:15,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1991190.0, ans=0.125 2024-08-13 04:29:19,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.75 vs. limit=10.0 2024-08-13 04:29:22,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1991290.0, ans=0.0 2024-08-13 04:29:29,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1991290.0, ans=0.125 2024-08-13 04:29:30,312 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.433e+01 2.666e+01 3.252e+01 5.472e+01, threshold=5.332e+01, percent-clipped=1.0 2024-08-13 04:29:37,922 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.972e-01 2024-08-13 04:29:41,816 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 04:29:52,733 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10750, loss[loss=0.08777, beats_loss=0.01188, ecapa_loss=0.0001806, whisper_loss=0.07408, over 16493.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01088, ecapa_loss=0.0001668, whisper_loss=0.09179, over 3852254.21 frames. ], batch size: 68, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:30:21,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2024-08-13 04:30:36,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1991690.0, ans=0.125 2024-08-13 04:31:04,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.42 vs. limit=22.5 2024-08-13 04:31:13,580 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10800, loss[loss=0.09927, beats_loss=0.01226, ecapa_loss=0.0001344, whisper_loss=0.08567, over 20421.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001668, whisper_loss=0.09192, over 3838330.71 frames. ], batch size: 81, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:31:40,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.71 vs. limit=10.0 2024-08-13 04:31:46,355 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 04:31:52,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-13 04:31:53,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1992190.0, ans=0.125 2024-08-13 04:31:54,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1992190.0, ans=0.125 2024-08-13 04:32:04,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1992290.0, ans=0.1 2024-08-13 04:32:05,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1992290.0, ans=0.125 2024-08-13 04:32:08,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1992290.0, ans=0.0 2024-08-13 04:32:10,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.408e+01 2.896e+01 3.475e+01 4.951e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-13 04:32:12,493 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 04:32:22,512 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 04:32:28,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1992390.0, ans=0.0 2024-08-13 04:32:32,916 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10850, loss[loss=0.1219, beats_loss=0.01034, ecapa_loss=0.0001716, whisper_loss=0.1099, over 14259.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0109, ecapa_loss=0.0001677, whisper_loss=0.09209, over 3863571.93 frames. ], batch size: 54, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:32:33,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1992490.0, ans=0.125 2024-08-13 04:32:50,792 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 04:32:53,455 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 04:33:02,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1992590.0, ans=0.1 2024-08-13 04:33:14,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1992690.0, ans=0.2 2024-08-13 04:33:19,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1992790.0, ans=0.05 2024-08-13 04:33:22,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1992790.0, ans=0.0 2024-08-13 04:33:29,870 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 04:33:31,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1992790.0, ans=0.1 2024-08-13 04:33:33,697 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2024-08-13 04:33:40,743 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 04:33:49,521 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.424e+00 2024-08-13 04:33:51,677 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10900, loss[loss=0.09576, beats_loss=0.01136, ecapa_loss=0.0001959, whisper_loss=0.08244, over 12752.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.0001674, whisper_loss=0.09201, over 3890135.33 frames. ], batch size: 53, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:34:08,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1993090.0, ans=0.0 2024-08-13 04:34:24,616 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 04:34:27,677 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 37 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 04:34:34,337 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.779e-02 2024-08-13 04:34:42,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1993290.0, ans=0.125 2024-08-13 04:34:51,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.538e+01 2.794e+01 3.172e+01 4.370e+01, threshold=5.589e+01, percent-clipped=0.0 2024-08-13 04:34:51,960 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-13 04:35:01,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1993390.0, ans=0.1 2024-08-13 04:35:12,628 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 10950, loss[loss=0.1003, beats_loss=0.01088, ecapa_loss=0.0002269, whisper_loss=0.08716, over 20440.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01088, ecapa_loss=0.0001669, whisper_loss=0.092, over 3911915.62 frames. ], batch size: 86, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:35:29,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1993590.0, ans=0.0 2024-08-13 04:35:36,161 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 04:35:37,529 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 12 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-13 04:35:43,405 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 04:35:51,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1993690.0, ans=0.1 2024-08-13 04:35:54,593 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 18 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 04:36:02,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1993790.0, ans=0.125 2024-08-13 04:36:08,451 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.83 vs. limit=22.5 2024-08-13 04:36:18,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1993890.0, ans=0.09899494936611666 2024-08-13 04:36:31,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1993890.0, ans=0.0 2024-08-13 04:36:32,333 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-13 04:36:33,414 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11000, loss[loss=0.1094, beats_loss=0.009163, ecapa_loss=0.0002685, whisper_loss=0.09758, over 19067.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01079, ecapa_loss=0.0001685, whisper_loss=0.09224, over 3888264.75 frames. ], batch size: 86, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:36:35,518 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 04:36:47,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1993990.0, ans=0.2 2024-08-13 04:36:49,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1994090.0, ans=0.125 2024-08-13 04:36:53,745 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=12.0 2024-08-13 04:37:00,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1994090.0, ans=0.1 2024-08-13 04:37:19,392 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 04:37:33,620 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.385e+01 2.603e+01 2.980e+01 9.171e+01, threshold=5.207e+01, percent-clipped=2.0 2024-08-13 04:37:54,193 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11050, loss[loss=0.09994, beats_loss=0.01186, ecapa_loss=0.0001639, whisper_loss=0.08644, over 16408.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01072, ecapa_loss=0.0001693, whisper_loss=0.09243, over 3894090.12 frames. ], batch size: 66, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:38:13,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1994590.0, ans=0.1 2024-08-13 04:38:16,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1994590.0, ans=0.125 2024-08-13 04:38:17,394 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-13 04:38:25,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1994690.0, ans=0.125 2024-08-13 04:38:31,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1994690.0, ans=0.0 2024-08-13 04:38:46,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1994790.0, ans=0.125 2024-08-13 04:38:52,536 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 32 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 04:38:54,196 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 04:39:18,239 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11100, loss[loss=0.1159, beats_loss=0.01115, ecapa_loss=0.0001571, whisper_loss=0.1032, over 19929.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01082, ecapa_loss=0.0001681, whisper_loss=0.09201, over 3897597.08 frames. ], batch size: 78, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:39:22,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1994990.0, ans=0.0 2024-08-13 04:39:40,733 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 04:40:22,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1995290.0, ans=0.125 2024-08-13 04:40:23,388 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.346e+01 2.633e+01 2.953e+01 4.555e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-13 04:40:33,815 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 04:40:37,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1995390.0, ans=0.0 2024-08-13 04:40:40,012 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 04:40:52,098 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11150, loss[loss=0.1145, beats_loss=0.008868, ecapa_loss=0.0001873, whisper_loss=0.1037, over 21433.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01083, ecapa_loss=0.0001678, whisper_loss=0.09171, over 3878837.75 frames. ], batch size: 88, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:41:04,273 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 04:41:18,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1995590.0, ans=0.125 2024-08-13 04:41:18,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1995590.0, ans=0.0 2024-08-13 04:41:19,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1995590.0, ans=0.0 2024-08-13 04:41:26,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1995590.0, ans=0.125 2024-08-13 04:42:22,680 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 04:42:26,154 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 04:42:31,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1995890.0, ans=0.125 2024-08-13 04:42:40,685 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11200, loss[loss=0.08006, beats_loss=0.01162, ecapa_loss=0.0001619, whisper_loss=0.06683, over 15362.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0108, ecapa_loss=0.0001671, whisper_loss=0.09166, over 3877260.85 frames. ], batch size: 64, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:43:14,293 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 04:43:14,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1996090.0, ans=0.0 2024-08-13 04:43:22,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1996090.0, ans=0.2 2024-08-13 04:43:29,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1996190.0, ans=0.125 2024-08-13 04:43:43,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1996190.0, ans=0.0 2024-08-13 04:44:09,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1996290.0, ans=0.0 2024-08-13 04:44:12,453 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.527e+01 2.790e+01 3.048e+01 4.600e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-13 04:44:22,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1996390.0, ans=0.04949747468305833 2024-08-13 04:44:47,887 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11250, loss[loss=0.08805, beats_loss=0.01246, ecapa_loss=0.0001351, whisper_loss=0.07423, over 20781.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01069, ecapa_loss=0.0001679, whisper_loss=0.09292, over 3876042.79 frames. ], batch size: 86, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:44:54,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1996490.0, ans=0.0 2024-08-13 04:45:01,418 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 35 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 04:45:08,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1996490.0, ans=0.1 2024-08-13 04:45:16,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1996590.0, ans=0.0 2024-08-13 04:45:39,212 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 04:46:36,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-08-13 04:46:38,435 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 04:46:46,909 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-08-13 04:46:51,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1996990.0, ans=0.2 2024-08-13 04:46:51,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1996990.0, ans=0.2 2024-08-13 04:46:52,351 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11300, loss[loss=0.1217, beats_loss=0.009417, ecapa_loss=0.0001751, whisper_loss=0.1105, over 22965.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01068, ecapa_loss=0.0001681, whisper_loss=0.09293, over 3894612.38 frames. ], batch size: 89, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:46:59,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1996990.0, ans=0.0 2024-08-13 04:47:04,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1996990.0, ans=0.125 2024-08-13 04:47:34,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1997090.0, ans=0.0 2024-08-13 04:47:38,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1997090.0, ans=0.125 2024-08-13 04:47:52,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1997190.0, ans=0.125 2024-08-13 04:47:57,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1997190.0, ans=0.0 2024-08-13 04:48:27,577 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.451e+01 2.765e+01 3.179e+01 5.185e+01, threshold=5.530e+01, percent-clipped=0.0 2024-08-13 04:48:40,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1997390.0, ans=0.125 2024-08-13 04:48:44,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1997390.0, ans=0.0 2024-08-13 04:49:00,001 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11350, loss[loss=0.1279, beats_loss=0.01058, ecapa_loss=0.0001619, whisper_loss=0.1157, over 24289.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01066, ecapa_loss=0.0001686, whisper_loss=0.09314, over 3875223.21 frames. ], batch size: 93, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:49:07,268 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 04:49:09,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1997490.0, ans=0.2 2024-08-13 04:49:15,596 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-13 04:49:18,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1997590.0, ans=0.0 2024-08-13 04:49:32,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1997590.0, ans=0.0 2024-08-13 04:49:33,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1997690.0, ans=0.125 2024-08-13 04:49:40,376 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 04:49:47,075 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 04:49:47,623 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-13 04:50:18,253 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 04:50:24,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1997890.0, ans=0.0 2024-08-13 04:50:29,815 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11400, loss[loss=0.08376, beats_loss=0.0112, ecapa_loss=0.0001193, whisper_loss=0.07137, over 15040.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01076, ecapa_loss=0.0001671, whisper_loss=0.09251, over 3863742.43 frames. ], batch size: 54, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:50:40,341 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 04:50:48,313 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 04:50:50,693 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2024-08-13 04:51:00,482 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2024-08-13 04:51:10,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1998190.0, ans=0.125 2024-08-13 04:51:13,241 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 04:51:39,735 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.469e+01 2.790e+01 3.072e+01 4.491e+01, threshold=5.580e+01, percent-clipped=0.0 2024-08-13 04:51:39,998 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 04:52:03,968 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11450, loss[loss=0.1055, beats_loss=0.01158, ecapa_loss=0.0001617, whisper_loss=0.09229, over 22137.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0108, ecapa_loss=0.0001664, whisper_loss=0.09192, over 3872423.30 frames. ], batch size: 90, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:52:36,172 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-13 04:52:45,942 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2024-08-13 04:52:46,558 INFO [train_multi_KD3.py:844] (1/4) A total of 97 cuts. 32 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 04:52:46,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1998690.0, ans=0.0 2024-08-13 04:53:07,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1998790.0, ans=0.2 2024-08-13 04:53:13,386 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2024-08-13 04:53:31,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1998890.0, ans=0.1 2024-08-13 04:53:38,119 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11500, loss[loss=0.08734, beats_loss=0.01143, ecapa_loss=0.0001552, whisper_loss=0.07436, over 15049.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01081, ecapa_loss=0.0001683, whisper_loss=0.09138, over 3875023.50 frames. ], batch size: 61, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:53:40,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1998990.0, ans=0.025 2024-08-13 04:53:57,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1999090.0, ans=0.1 2024-08-13 04:54:17,736 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 29 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 04:54:21,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1999190.0, ans=0.1 2024-08-13 04:54:45,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.530e+01 2.837e+01 3.156e+01 6.576e+01, threshold=5.675e+01, percent-clipped=1.0 2024-08-13 04:54:47,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1999290.0, ans=0.125 2024-08-13 04:54:51,715 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2024-08-13 04:54:52,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1999390.0, ans=0.0 2024-08-13 04:54:59,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1999390.0, ans=0.125 2024-08-13 04:55:07,904 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11550, loss[loss=0.1165, beats_loss=0.0102, ecapa_loss=0.0001968, whisper_loss=0.1043, over 15845.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01078, ecapa_loss=0.0001689, whisper_loss=0.09204, over 3910350.72 frames. ], batch size: 61, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:55:33,302 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 34 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 04:55:37,572 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 04:55:57,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1999690.0, ans=0.0 2024-08-13 04:55:59,812 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-13 04:56:01,557 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2024-08-13 04:56:05,109 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=12.0 2024-08-13 04:56:17,920 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=15.0 2024-08-13 04:56:23,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1999890.0, ans=0.0 2024-08-13 04:56:31,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1999890.0, ans=0.0 2024-08-13 04:56:34,413 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-13 04:56:40,272 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11600, loss[loss=0.09948, beats_loss=0.009825, ecapa_loss=0.000212, whisper_loss=0.08753, over 13080.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01076, ecapa_loss=0.0001688, whisper_loss=0.09207, over 3900551.38 frames. ], batch size: 56, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:56:45,151 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-13 04:56:48,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1999990.0, ans=0.125 2024-08-13 04:56:51,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1999990.0, ans=0.125 2024-08-13 04:57:06,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2000090.0, ans=0.0 2024-08-13 04:57:16,055 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 04:57:25,148 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.35 vs. limit=15.0 2024-08-13 04:57:53,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2000290.0, ans=10.0 2024-08-13 04:57:56,301 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.423e+01 2.636e+01 2.832e+01 7.836e+01, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 04:58:03,337 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 04:58:22,169 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11650, loss[loss=0.1015, beats_loss=0.0133, ecapa_loss=0.0001293, whisper_loss=0.08687, over 16599.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01074, ecapa_loss=0.0001686, whisper_loss=0.09189, over 3923989.93 frames. ], batch size: 65, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:58:24,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2000490.0, ans=0.125 2024-08-13 04:58:30,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2000490.0, ans=0.0 2024-08-13 04:58:39,511 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 04:58:59,493 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.630e-02 2024-08-13 04:59:13,633 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 04:59:15,513 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 04:59:35,641 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 04:59:56,917 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11700, loss[loss=0.08443, beats_loss=0.01436, ecapa_loss=0.0002014, whisper_loss=0.06806, over 20938.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01089, ecapa_loss=0.0001683, whisper_loss=0.09152, over 3939743.58 frames. ], batch size: 92, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:00:11,990 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 05:00:23,745 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 05:01:00,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2001290.0, ans=0.1 2024-08-13 05:01:00,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2001290.0, ans=0.0 2024-08-13 05:01:01,173 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.34 vs. limit=10.0 2024-08-13 05:01:07,047 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.358e+01 2.707e+01 3.132e+01 5.516e+01, threshold=5.414e+01, percent-clipped=1.0 2024-08-13 05:01:17,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2001390.0, ans=0.125 2024-08-13 05:01:30,500 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11750, loss[loss=0.1232, beats_loss=0.009188, ecapa_loss=0.0001622, whisper_loss=0.1124, over 19841.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01086, ecapa_loss=0.0001682, whisper_loss=0.0917, over 3922859.00 frames. ], batch size: 78, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:01:39,282 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.86 vs. limit=22.5 2024-08-13 05:01:47,807 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 30 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-13 05:01:54,641 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 33 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-13 05:02:15,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2001690.0, ans=0.125 2024-08-13 05:02:16,675 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 05:02:36,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2001790.0, ans=0.2 2024-08-13 05:02:42,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2001790.0, ans=0.125 2024-08-13 05:03:03,009 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11800, loss[loss=0.121, beats_loss=0.01167, ecapa_loss=0.0001322, whisper_loss=0.108, over 18619.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01088, ecapa_loss=0.0001681, whisper_loss=0.09207, over 3926536.18 frames. ], batch size: 70, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:03:08,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2001990.0, ans=0.0 2024-08-13 05:03:49,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2002190.0, ans=0.0 2024-08-13 05:04:04,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2002290.0, ans=0.125 2024-08-13 05:04:06,134 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.539e+01 2.830e+01 3.148e+01 9.366e+01, threshold=5.659e+01, percent-clipped=1.0 2024-08-13 05:04:08,549 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-13 05:04:09,214 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2024-08-13 05:04:24,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2002390.0, ans=0.05 2024-08-13 05:04:29,041 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11850, loss[loss=0.08816, beats_loss=0.01084, ecapa_loss=0.0001769, whisper_loss=0.07555, over 22213.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01099, ecapa_loss=0.0001679, whisper_loss=0.09104, over 3929792.91 frames. ], batch size: 91, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:05:10,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2002690.0, ans=0.2 2024-08-13 05:05:10,905 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2024-08-13 05:05:24,490 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 05:05:28,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2002790.0, ans=0.125 2024-08-13 05:05:45,171 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 05:05:50,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2002890.0, ans=0.125 2024-08-13 05:05:57,507 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11900, loss[loss=0.09923, beats_loss=0.01032, ecapa_loss=0.0001753, whisper_loss=0.08715, over 19910.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0109, ecapa_loss=0.0001677, whisper_loss=0.092, over 3939579.20 frames. ], batch size: 82, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:06:13,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2003090.0, ans=0.125 2024-08-13 05:06:18,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2003090.0, ans=0.125 2024-08-13 05:06:27,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2003090.0, ans=0.1 2024-08-13 05:06:29,495 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 05:06:34,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2003190.0, ans=0.125 2024-08-13 05:06:49,529 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 05:07:00,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.520e+01 2.675e+01 3.005e+01 5.998e+01, threshold=5.349e+01, percent-clipped=1.0 2024-08-13 05:07:05,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2003390.0, ans=0.125 2024-08-13 05:07:23,697 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 11950, loss[loss=0.1068, beats_loss=0.008104, ecapa_loss=0.0001527, whisper_loss=0.09718, over 16414.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01085, ecapa_loss=0.0001689, whisper_loss=0.09239, over 3925056.16 frames. ], batch size: 62, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:07:29,727 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 05:07:45,402 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 24 from LS+wenet, 18 from Vox, 13 fro AS 2024-08-13 05:08:12,099 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.58 vs. limit=15.0 2024-08-13 05:08:22,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2003790.0, ans=0.125 2024-08-13 05:08:43,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2003890.0, ans=0.1 2024-08-13 05:08:45,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2003890.0, ans=0.125 2024-08-13 05:08:49,880 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12000, loss[loss=0.08909, beats_loss=0.01317, ecapa_loss=0.0001216, whisper_loss=0.0747, over 15931.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01079, ecapa_loss=0.0001696, whisper_loss=0.09291, over 3893053.26 frames. ], batch size: 61, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:08:49,880 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 05:09:29,223 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005731, whisper_loss=0.2468, over 922467.00 frames. 2024-08-13 05:09:48,220 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on SV_voxceleb1: loss=0.004602, beats_loss=0, ecapa_loss=0.0004602, whisper_loss=0, over 939242.00 frames. 2024-08-13 05:11:41,003 INFO [train_multi_KD3.py:1149] (1/4) Epoch 14, validation on AT_audioset: loss=0.0239, beats_loss=0.0239, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 05:11:41,006 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-13 05:11:56,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2004090.0, ans=0.125 2024-08-13 05:11:58,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2004090.0, ans=0.07 2024-08-13 05:12:04,041 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 22 from LS+wenet, 18 from Vox, 52 fro AS 2024-08-13 05:12:05,721 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 05:12:26,579 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 05:12:37,126 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 24 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 05:12:37,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2004290.0, ans=0.0 2024-08-13 05:12:43,266 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.475e+01 2.665e+01 3.111e+01 1.048e+02, threshold=5.329e+01, percent-clipped=1.0 2024-08-13 05:12:56,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2004390.0, ans=0.1 2024-08-13 05:13:04,539 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12050, loss[loss=0.0965, beats_loss=0.01026, ecapa_loss=0.0001624, whisper_loss=0.08461, over 14503.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0108, ecapa_loss=0.0001694, whisper_loss=0.09246, over 3864999.10 frames. ], batch size: 56, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:13:33,687 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 05:13:35,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2004690.0, ans=0.1 2024-08-13 05:13:37,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2004690.0, ans=0.125 2024-08-13 05:13:42,590 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 20 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-13 05:13:59,447 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.93 vs. limit=15.0 2024-08-13 05:14:15,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2004890.0, ans=0.125 2024-08-13 05:14:26,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2004890.0, ans=0.125 2024-08-13 05:14:28,963 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12100, loss[loss=0.09868, beats_loss=0.01105, ecapa_loss=0.0001418, whisper_loss=0.08621, over 22022.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01093, ecapa_loss=0.000168, whisper_loss=0.09179, over 3883446.80 frames. ], batch size: 87, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:14:42,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2004990.0, ans=0.0 2024-08-13 05:14:42,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2004990.0, ans=0.2 2024-08-13 05:14:44,771 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 05:14:45,022 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.524e-01 2024-08-13 05:14:53,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2005090.0, ans=0.0 2024-08-13 05:15:24,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2005290.0, ans=0.125 2024-08-13 05:15:25,339 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2024-08-13 05:15:29,358 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.204e-01 2024-08-13 05:15:31,283 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.461e+01 2.696e+01 3.254e+01 5.243e+01, threshold=5.392e+01, percent-clipped=0.0 2024-08-13 05:15:32,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-13 05:15:43,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2005390.0, ans=0.125 2024-08-13 05:15:45,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2005390.0, ans=0.0 2024-08-13 05:15:52,080 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12150, loss[loss=0.105, beats_loss=0.01062, ecapa_loss=0.0001784, whisper_loss=0.09256, over 21752.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.0001672, whisper_loss=0.09152, over 3864316.39 frames. ], batch size: 88, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:15:54,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2005490.0, ans=0.1 2024-08-13 05:15:58,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2005490.0, ans=0.0 2024-08-13 05:16:17,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2005590.0, ans=0.2 2024-08-13 05:16:29,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2005690.0, ans=0.125 2024-08-13 05:16:54,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2005790.0, ans=0.125 2024-08-13 05:16:57,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2005790.0, ans=0.125 2024-08-13 05:17:11,154 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.63 vs. limit=15.0 2024-08-13 05:17:17,276 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12200, loss[loss=0.09165, beats_loss=0.01246, ecapa_loss=0.0001807, whisper_loss=0.07738, over 18200.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01087, ecapa_loss=0.0001671, whisper_loss=0.09156, over 3837547.15 frames. ], batch size: 77, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:17:25,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2005990.0, ans=0.2 2024-08-13 05:17:25,720 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.88 vs. limit=22.5 2024-08-13 05:17:29,733 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 19 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-13 05:17:36,974 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 05:17:37,441 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.73 vs. limit=22.5 2024-08-13 05:17:42,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-08-13 05:17:44,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2006090.0, ans=0.125 2024-08-13 05:17:47,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2006090.0, ans=0.125 2024-08-13 05:18:10,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2006290.0, ans=0.125 2024-08-13 05:18:11,348 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 05:18:21,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.467e+01 2.824e+01 3.197e+01 4.821e+01, threshold=5.649e+01, percent-clipped=0.0 2024-08-13 05:18:25,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2006390.0, ans=0.0 2024-08-13 05:18:27,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2006390.0, ans=0.2 2024-08-13 05:18:32,514 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-13 05:18:42,539 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12250, loss[loss=0.1163, beats_loss=0.01054, ecapa_loss=0.0001593, whisper_loss=0.1042, over 22363.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01083, ecapa_loss=0.0001666, whisper_loss=0.09205, over 3872105.36 frames. ], batch size: 89, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:19:06,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2006590.0, ans=0.0 2024-08-13 05:19:18,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2006690.0, ans=0.07 2024-08-13 05:19:51,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2006890.0, ans=0.1 2024-08-13 05:20:01,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2006890.0, ans=0.0 2024-08-13 05:20:03,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2006990.0, ans=0.2 2024-08-13 05:20:04,684 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12300, loss[loss=0.102, beats_loss=0.0116, ecapa_loss=0.0001327, whisper_loss=0.08907, over 23320.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01098, ecapa_loss=0.0001652, whisper_loss=0.09111, over 3873025.12 frames. ], batch size: 91, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:20:04,791 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 05:20:05,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2024-08-13 05:20:06,406 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 05:20:15,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2006990.0, ans=0.0 2024-08-13 05:20:21,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2007090.0, ans=0.125 2024-08-13 05:21:06,649 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.461e+01 2.771e+01 3.048e+01 4.529e+01, threshold=5.542e+01, percent-clipped=0.0 2024-08-13 05:21:10,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2007290.0, ans=0.125 2024-08-13 05:21:30,658 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12350, loss[loss=0.09831, beats_loss=0.01274, ecapa_loss=0.0001541, whisper_loss=0.08403, over 21521.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01098, ecapa_loss=0.0001672, whisper_loss=0.09107, over 3888752.95 frames. ], batch size: 87, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:21:41,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2007490.0, ans=0.1 2024-08-13 05:21:50,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2007590.0, ans=0.1 2024-08-13 05:21:51,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2007590.0, ans=0.125 2024-08-13 05:22:02,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2007690.0, ans=0.2 2024-08-13 05:22:50,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2007890.0, ans=0.1 2024-08-13 05:22:51,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2007890.0, ans=0.0 2024-08-13 05:22:51,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2007890.0, ans=0.125 2024-08-13 05:22:55,535 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12400, loss[loss=0.1132, beats_loss=0.008879, ecapa_loss=0.0001992, whisper_loss=0.1024, over 18879.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01091, ecapa_loss=0.0001677, whisper_loss=0.09023, over 3870112.65 frames. ], batch size: 80, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:23:14,702 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 05:23:20,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2008090.0, ans=0.0 2024-08-13 05:23:23,972 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.80 vs. limit=22.5 2024-08-13 05:23:42,187 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 14 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 05:23:59,577 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.499e+01 2.802e+01 3.094e+01 1.002e+02, threshold=5.604e+01, percent-clipped=2.0 2024-08-13 05:24:10,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2008390.0, ans=0.125 2024-08-13 05:24:22,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12450, loss[loss=0.1141, beats_loss=0.009471, ecapa_loss=0.000184, whisper_loss=0.1028, over 20536.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01094, ecapa_loss=0.0001675, whisper_loss=0.09046, over 3894730.90 frames. ], batch size: 85, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:24:23,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2008490.0, ans=0.0 2024-08-13 05:24:28,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2008490.0, ans=0.125 2024-08-13 05:24:34,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2008490.0, ans=0.1 2024-08-13 05:24:37,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2008590.0, ans=0.125 2024-08-13 05:24:40,551 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 05:24:48,825 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 21 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 05:24:50,678 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 05:25:01,707 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 05:25:03,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2008690.0, ans=0.1 2024-08-13 05:25:15,447 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 05:25:42,527 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 05:25:50,528 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12500, loss[loss=0.1323, beats_loss=0.007146, ecapa_loss=0.0001393, whisper_loss=0.1238, over 17782.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01093, ecapa_loss=0.0001649, whisper_loss=0.0905, over 3869491.89 frames. ], batch size: 65, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:26:09,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=2009090.0, ans=0.1 2024-08-13 05:26:17,671 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 05:26:34,693 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.18 vs. limit=12.0 2024-08-13 05:26:35,370 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 05:26:51,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.389e+01 2.676e+01 3.149e+01 9.586e+01, threshold=5.353e+01, percent-clipped=2.0 2024-08-13 05:26:56,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2009390.0, ans=0.1 2024-08-13 05:26:59,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2009390.0, ans=0.2 2024-08-13 05:27:02,846 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 21 from LS+wenet, 18 from Vox, 53 fro AS 2024-08-13 05:27:11,432 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=12.0 2024-08-13 05:27:14,366 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12550, loss[loss=0.139, beats_loss=0.007257, ecapa_loss=0.0002113, whisper_loss=0.1297, over 21989.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01083, ecapa_loss=0.0001662, whisper_loss=0.09187, over 3907197.38 frames. ], batch size: 91, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:27:18,547 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 05:27:25,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2009490.0, ans=0.035 2024-08-13 05:28:03,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2009790.0, ans=0.125 2024-08-13 05:28:10,206 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2024-08-13 05:28:34,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2009990.0, ans=0.0 2024-08-13 05:28:35,802 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12600, loss[loss=0.1155, beats_loss=0.009432, ecapa_loss=0.0001723, whisper_loss=0.1043, over 17733.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01086, ecapa_loss=0.0001661, whisper_loss=0.09245, over 3921392.00 frames. ], batch size: 69, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:28:45,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2009990.0, ans=0.0 2024-08-13 05:28:48,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2009990.0, ans=0.0 2024-08-13 05:28:50,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2010090.0, ans=0.125 2024-08-13 05:28:53,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2010090.0, ans=0.0 2024-08-13 05:28:56,944 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 05:28:59,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2010090.0, ans=0.125 2024-08-13 05:29:16,814 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 19 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-13 05:29:34,718 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 05:29:36,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.351e+01 2.664e+01 2.979e+01 4.679e+01, threshold=5.327e+01, percent-clipped=0.0 2024-08-13 05:29:43,099 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 05:29:57,445 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12650, loss[loss=0.1006, beats_loss=0.01137, ecapa_loss=0.0001843, whisper_loss=0.08739, over 17274.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01089, ecapa_loss=0.0001665, whisper_loss=0.09173, over 3896068.44 frames. ], batch size: 69, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:30:39,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2010690.0, ans=0.1 2024-08-13 05:30:40,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2010690.0, ans=0.125 2024-08-13 05:30:46,426 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 05:30:48,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2010790.0, ans=0.125 2024-08-13 05:30:53,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2010790.0, ans=0.125 2024-08-13 05:31:21,809 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12700, loss[loss=0.09801, beats_loss=0.01365, ecapa_loss=0.0001302, whisper_loss=0.08306, over 21001.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01092, ecapa_loss=0.0001664, whisper_loss=0.09186, over 3891778.70 frames. ], batch size: 84, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:31:25,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2010990.0, ans=0.125 2024-08-13 05:31:30,083 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.74 vs. limit=22.5 2024-08-13 05:31:36,376 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 05:32:21,912 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.465e+01 2.775e+01 3.008e+01 5.404e+01, threshold=5.550e+01, percent-clipped=1.0 2024-08-13 05:32:30,373 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 05:32:34,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2011390.0, ans=0.95 2024-08-13 05:32:41,936 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 05:32:42,966 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12750, loss[loss=0.0883, beats_loss=0.01258, ecapa_loss=0.0001811, whisper_loss=0.07391, over 20848.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01096, ecapa_loss=0.0001681, whisper_loss=0.09146, over 3887778.27 frames. ], batch size: 92, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:32:52,146 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 19 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-13 05:33:16,361 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-13 05:33:24,028 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 05:33:28,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2011690.0, ans=0.125 2024-08-13 05:33:41,670 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 05:33:46,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2011890.0, ans=0.125 2024-08-13 05:33:48,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2011890.0, ans=0.125 2024-08-13 05:34:03,676 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12800, loss[loss=0.1088, beats_loss=0.009965, ecapa_loss=0.0001899, whisper_loss=0.09697, over 20297.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.011, ecapa_loss=0.000169, whisper_loss=0.09114, over 3899777.09 frames. ], batch size: 82, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:34:22,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2012090.0, ans=0.2 2024-08-13 05:34:23,464 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 24 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-13 05:34:34,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2012090.0, ans=0.0 2024-08-13 05:34:44,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2012190.0, ans=0.0 2024-08-13 05:34:53,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2012290.0, ans=0.0 2024-08-13 05:34:58,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2012290.0, ans=0.125 2024-08-13 05:34:59,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2012290.0, ans=0.0 2024-08-13 05:35:03,820 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.92 vs. limit=10.0 2024-08-13 05:35:05,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.426e+01 2.719e+01 3.089e+01 6.356e+01, threshold=5.438e+01, percent-clipped=2.0 2024-08-13 05:35:05,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2012290.0, ans=0.035 2024-08-13 05:35:05,905 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.597e-02 2024-08-13 05:35:19,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2012390.0, ans=0.1 2024-08-13 05:35:27,231 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12850, loss[loss=0.09527, beats_loss=0.01162, ecapa_loss=0.000179, whisper_loss=0.08186, over 22486.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01111, ecapa_loss=0.0001692, whisper_loss=0.09009, over 3902103.43 frames. ], batch size: 95, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:35:35,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2012490.0, ans=0.125 2024-08-13 05:35:39,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2012490.0, ans=0.125 2024-08-13 05:35:44,029 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 05:35:49,850 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 05:35:58,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2012690.0, ans=0.0 2024-08-13 05:36:09,017 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-13 05:36:09,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2012690.0, ans=0.125 2024-08-13 05:36:17,334 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 05:36:19,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2012790.0, ans=0.125 2024-08-13 05:36:20,477 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 05:36:20,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2012790.0, ans=0.025 2024-08-13 05:36:27,797 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 05:36:31,453 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-13 05:36:40,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2012890.0, ans=0.5 2024-08-13 05:36:47,373 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12900, loss[loss=0.09492, beats_loss=0.01111, ecapa_loss=0.0001615, whisper_loss=0.08219, over 17507.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01102, ecapa_loss=0.0001697, whisper_loss=0.09035, over 3886235.67 frames. ], batch size: 69, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:36:59,899 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.02 vs. limit=10.0 2024-08-13 05:37:01,108 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-13 05:37:10,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2013090.0, ans=0.125 2024-08-13 05:37:17,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2013190.0, ans=0.025 2024-08-13 05:37:27,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2013190.0, ans=0.025 2024-08-13 05:37:30,693 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.63 vs. limit=22.5 2024-08-13 05:37:44,837 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.357e+01 2.603e+01 2.918e+01 4.145e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-13 05:38:07,038 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 12950, loss[loss=0.08325, beats_loss=0.01041, ecapa_loss=0.0001744, whisper_loss=0.0711, over 16051.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01098, ecapa_loss=0.0001693, whisper_loss=0.09034, over 3867579.15 frames. ], batch size: 63, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:38:20,395 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 05:38:29,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2013590.0, ans=0.2 2024-08-13 05:38:40,003 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=15.0 2024-08-13 05:38:55,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2013690.0, ans=0.04949747468305833 2024-08-13 05:39:01,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2013790.0, ans=10.0 2024-08-13 05:39:03,903 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-13 05:39:20,448 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 05:39:27,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2013890.0, ans=0.2 2024-08-13 05:39:30,062 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13000, loss[loss=0.1288, beats_loss=0.01022, ecapa_loss=0.0002015, whisper_loss=0.1166, over 20854.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0109, ecapa_loss=0.0001694, whisper_loss=0.09126, over 3877092.84 frames. ], batch size: 86, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:39:49,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2014090.0, ans=0.0 2024-08-13 05:39:50,276 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=8.0 2024-08-13 05:39:56,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2014090.0, ans=0.0 2024-08-13 05:40:09,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2024-08-13 05:40:31,563 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.457e+01 2.798e+01 3.261e+01 6.703e+01, threshold=5.596e+01, percent-clipped=3.0 2024-08-13 05:40:33,110 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 21 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-13 05:40:36,654 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 18 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 05:40:41,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2014390.0, ans=0.125 2024-08-13 05:40:50,803 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-08-13 05:40:52,671 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13050, loss[loss=0.1027, beats_loss=0.01244, ecapa_loss=0.0001641, whisper_loss=0.08859, over 15764.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01089, ecapa_loss=0.0001691, whisper_loss=0.09079, over 3855751.04 frames. ], batch size: 63, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:41:01,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2014490.0, ans=0.125 2024-08-13 05:41:17,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2014590.0, ans=0.0 2024-08-13 05:42:08,153 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2024-08-13 05:42:12,424 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13100, loss[loss=0.1195, beats_loss=0.01009, ecapa_loss=0.0001639, whisper_loss=0.1078, over 22500.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01098, ecapa_loss=0.0001676, whisper_loss=0.09045, over 3883164.54 frames. ], batch size: 90, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:42:22,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.06 vs. limit=6.0 2024-08-13 05:42:33,729 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-13 05:42:35,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.71 vs. limit=10.0 2024-08-13 05:43:12,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.412e+01 2.747e+01 3.007e+01 5.883e+01, threshold=5.493e+01, percent-clipped=1.0 2024-08-13 05:43:30,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2015390.0, ans=0.0 2024-08-13 05:43:33,703 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13150, loss[loss=0.09146, beats_loss=0.01009, ecapa_loss=0.0001535, whisper_loss=0.07983, over 15574.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01092, ecapa_loss=0.0001665, whisper_loss=0.09053, over 3871830.83 frames. ], batch size: 59, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:44:01,073 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 24 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 05:44:09,605 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2024-08-13 05:44:16,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2015690.0, ans=0.0 2024-08-13 05:44:21,820 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 05:44:22,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.39 vs. limit=22.5 2024-08-13 05:44:33,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2015790.0, ans=0.0 2024-08-13 05:44:53,902 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13200, loss[loss=0.09107, beats_loss=0.01182, ecapa_loss=0.000185, whisper_loss=0.0774, over 14059.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01091, ecapa_loss=0.0001659, whisper_loss=0.09115, over 3867038.92 frames. ], batch size: 60, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:44:56,899 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 05:45:14,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2016090.0, ans=0.125 2024-08-13 05:45:42,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2016290.0, ans=0.125 2024-08-13 05:45:43,483 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-13 05:45:46,507 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 23 from LS+wenet, 17 from Vox, 14 fro AS 2024-08-13 05:45:48,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2016290.0, ans=0.125 2024-08-13 05:45:53,967 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.423e+01 2.725e+01 2.981e+01 4.895e+01, threshold=5.450e+01, percent-clipped=0.0 2024-08-13 05:46:04,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2016390.0, ans=0.0 2024-08-13 05:46:14,913 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13250, loss[loss=0.1051, beats_loss=0.01143, ecapa_loss=0.0001394, whisper_loss=0.09229, over 13886.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01087, ecapa_loss=0.0001674, whisper_loss=0.09145, over 3851853.97 frames. ], batch size: 53, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:46:19,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2016490.0, ans=0.025 2024-08-13 05:46:30,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2016590.0, ans=0.125 2024-08-13 05:46:37,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2016590.0, ans=0.1 2024-08-13 05:47:00,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2016690.0, ans=0.125 2024-08-13 05:47:19,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2016790.0, ans=0.125 2024-08-13 05:47:24,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2016890.0, ans=0.0 2024-08-13 05:47:33,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2016890.0, ans=0.1 2024-08-13 05:47:35,170 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-13 05:47:41,110 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13300, loss[loss=0.1075, beats_loss=0.01025, ecapa_loss=0.0001725, whisper_loss=0.09553, over 20142.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01081, ecapa_loss=0.0001672, whisper_loss=0.09173, over 3880890.63 frames. ], batch size: 84, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:47:45,420 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 05:48:07,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2017090.0, ans=0.07 2024-08-13 05:48:23,255 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 05:48:27,429 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=15.0 2024-08-13 05:48:38,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2017290.0, ans=0.1 2024-08-13 05:48:42,653 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.445e+01 2.718e+01 3.162e+01 4.686e+01, threshold=5.435e+01, percent-clipped=0.0 2024-08-13 05:48:43,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2017290.0, ans=0.07 2024-08-13 05:48:58,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2017390.0, ans=0.2 2024-08-13 05:49:03,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2017490.0, ans=0.2 2024-08-13 05:49:03,867 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13350, loss[loss=0.09732, beats_loss=0.01253, ecapa_loss=0.0001628, whisper_loss=0.08316, over 17078.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01079, ecapa_loss=0.0001681, whisper_loss=0.09203, over 3891313.12 frames. ], batch size: 67, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:49:04,421 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 05:49:22,061 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.96 vs. limit=15.0 2024-08-13 05:49:24,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2017590.0, ans=0.1 2024-08-13 05:49:34,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2017590.0, ans=0.5 2024-08-13 05:49:38,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2017690.0, ans=0.2 2024-08-13 05:49:45,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2017690.0, ans=0.5 2024-08-13 05:49:59,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2017790.0, ans=0.0 2024-08-13 05:50:26,090 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13400, loss[loss=0.1098, beats_loss=0.009294, ecapa_loss=0.0001841, whisper_loss=0.0987, over 17249.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01081, ecapa_loss=0.0001679, whisper_loss=0.09188, over 3880496.20 frames. ], batch size: 68, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:50:50,809 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 05:51:19,483 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=12.0 2024-08-13 05:51:28,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2018290.0, ans=0.125 2024-08-13 05:51:28,854 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.489e+01 2.760e+01 3.071e+01 5.716e+01, threshold=5.519e+01, percent-clipped=1.0 2024-08-13 05:51:50,207 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13450, loss[loss=0.1259, beats_loss=0.006892, ecapa_loss=0.0001602, whisper_loss=0.1174, over 19856.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01083, ecapa_loss=0.0001682, whisper_loss=0.09115, over 3897973.05 frames. ], batch size: 73, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:51:54,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2018490.0, ans=0.125 2024-08-13 05:52:36,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2018690.0, ans=0.0 2024-08-13 05:52:47,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2018790.0, ans=0.0 2024-08-13 05:53:14,437 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13500, loss[loss=0.07898, beats_loss=0.0117, ecapa_loss=0.0001877, whisper_loss=0.0654, over 15751.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01076, ecapa_loss=0.0001695, whisper_loss=0.09203, over 3904074.22 frames. ], batch size: 66, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:53:18,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2018990.0, ans=0.125 2024-08-13 05:53:28,037 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 05:53:43,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2019090.0, ans=0.0 2024-08-13 05:53:45,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2019090.0, ans=0.1 2024-08-13 05:54:08,168 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.88 vs. limit=22.5 2024-08-13 05:54:17,720 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.519e+01 2.845e+01 3.228e+01 5.669e+01, threshold=5.689e+01, percent-clipped=1.0 2024-08-13 05:54:22,840 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.28 vs. limit=22.5 2024-08-13 05:54:32,671 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 05:54:39,044 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13550, loss[loss=0.1111, beats_loss=0.009972, ecapa_loss=0.0001485, whisper_loss=0.09963, over 21395.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01075, ecapa_loss=0.0001689, whisper_loss=0.09191, over 3889590.13 frames. ], batch size: 84, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:54:41,298 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 05:54:55,759 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 05:55:12,535 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 05:55:19,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2019690.0, ans=0.0 2024-08-13 05:55:23,873 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 05:55:25,415 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 05:55:44,335 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 05:55:46,656 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=22.5 2024-08-13 05:55:49,077 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 05:55:59,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2019890.0, ans=0.0 2024-08-13 05:56:02,569 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13600, loss[loss=0.1164, beats_loss=0.01034, ecapa_loss=0.0001378, whisper_loss=0.1047, over 18438.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01085, ecapa_loss=0.000168, whisper_loss=0.0918, over 3903883.81 frames. ], batch size: 71, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:56:05,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2024-08-13 05:56:28,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2020090.0, ans=0.07 2024-08-13 05:56:31,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2020090.0, ans=0.125 2024-08-13 05:56:33,964 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.53 vs. limit=10.0 2024-08-13 05:56:41,737 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-13 05:56:59,439 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 05:57:01,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2020290.0, ans=0.125 2024-08-13 05:57:03,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=12.0 2024-08-13 05:57:03,771 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.439e+01 2.789e+01 3.158e+01 4.809e+01, threshold=5.578e+01, percent-clipped=0.0 2024-08-13 05:57:17,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2020390.0, ans=0.0 2024-08-13 05:57:25,492 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13650, loss[loss=0.09991, beats_loss=0.01306, ecapa_loss=0.0001434, whisper_loss=0.08541, over 22961.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01087, ecapa_loss=0.0001686, whisper_loss=0.09181, over 3909970.51 frames. ], batch size: 93, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:57:29,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2020490.0, ans=0.0 2024-08-13 05:57:45,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2020590.0, ans=0.0 2024-08-13 05:57:50,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2020590.0, ans=0.125 2024-08-13 05:58:37,238 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 05:58:45,111 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13700, loss[loss=0.09853, beats_loss=0.01094, ecapa_loss=0.0001615, whisper_loss=0.08598, over 20825.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01087, ecapa_loss=0.0001686, whisper_loss=0.09214, over 3899890.22 frames. ], batch size: 83, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:58:47,098 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 18 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-13 05:58:58,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2020990.0, ans=0.0 2024-08-13 05:59:18,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2021190.0, ans=0.0 2024-08-13 05:59:27,956 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.35 vs. limit=10.0 2024-08-13 05:59:29,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2021290.0, ans=0.025 2024-08-13 05:59:40,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.485e+01 2.717e+01 3.143e+01 5.833e+01, threshold=5.434e+01, percent-clipped=2.0 2024-08-13 05:59:46,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2021390.0, ans=0.125 2024-08-13 05:59:48,088 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 05:59:58,628 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13750, loss[loss=0.08578, beats_loss=0.0114, ecapa_loss=0.0001851, whisper_loss=0.07253, over 17934.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.000168, whisper_loss=0.09175, over 3880092.12 frames. ], batch size: 78, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:00:00,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2021490.0, ans=0.2 2024-08-13 06:00:05,199 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 06:00:08,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2021490.0, ans=0.1 2024-08-13 06:00:15,352 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 18 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 06:00:25,060 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 06:00:25,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2021690.0, ans=0.125 2024-08-13 06:00:25,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2021690.0, ans=0.0 2024-08-13 06:00:32,093 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 06:00:48,199 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-13 06:00:50,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2021790.0, ans=0.09899494936611666 2024-08-13 06:01:07,297 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13800, loss[loss=0.1014, beats_loss=0.009036, ecapa_loss=0.0002194, whisper_loss=0.09019, over 20099.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01085, ecapa_loss=0.0001667, whisper_loss=0.09171, over 3889408.76 frames. ], batch size: 82, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:01:28,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2022090.0, ans=0.0 2024-08-13 06:01:33,705 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2024-08-13 06:01:34,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2022190.0, ans=0.125 2024-08-13 06:01:37,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2022190.0, ans=0.0 2024-08-13 06:01:57,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.404e+01 2.696e+01 2.984e+01 4.554e+01, threshold=5.391e+01, percent-clipped=0.0 2024-08-13 06:02:15,277 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13850, loss[loss=0.1006, beats_loss=0.009551, ecapa_loss=0.0001598, whisper_loss=0.08943, over 15679.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01079, ecapa_loss=0.0001666, whisper_loss=0.09199, over 3874364.49 frames. ], batch size: 59, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:02:16,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2022490.0, ans=0.125 2024-08-13 06:02:21,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2022490.0, ans=0.125 2024-08-13 06:02:30,276 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-13 06:02:31,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2022590.0, ans=0.125 2024-08-13 06:02:41,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2022590.0, ans=0.0 2024-08-13 06:02:50,023 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 06:02:51,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2022690.0, ans=0.125 2024-08-13 06:03:09,850 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-13 06:03:24,530 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13900, loss[loss=0.1113, beats_loss=0.009511, ecapa_loss=0.0001817, whisper_loss=0.09994, over 19282.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0108, ecapa_loss=0.0001673, whisper_loss=0.09163, over 3850462.04 frames. ], batch size: 76, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:03:37,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2023090.0, ans=0.025 2024-08-13 06:03:37,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2023090.0, ans=0.125 2024-08-13 06:03:38,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2023090.0, ans=0.125 2024-08-13 06:03:45,734 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 06:04:06,843 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2024-08-13 06:04:10,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2023290.0, ans=0.125 2024-08-13 06:04:13,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2023290.0, ans=0.1 2024-08-13 06:04:15,466 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.449e+01 2.734e+01 3.123e+01 1.484e+02, threshold=5.468e+01, percent-clipped=1.0 2024-08-13 06:04:33,783 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 13950, loss[loss=0.08906, beats_loss=0.01003, ecapa_loss=0.0001707, whisper_loss=0.07733, over 22688.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01086, ecapa_loss=0.0001664, whisper_loss=0.09163, over 3881951.48 frames. ], batch size: 91, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:04:46,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2023590.0, ans=0.0 2024-08-13 06:04:47,909 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 26 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 06:05:00,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2023690.0, ans=0.125 2024-08-13 06:05:05,621 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 06:05:14,974 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 06:05:28,796 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2024-08-13 06:05:31,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2023890.0, ans=0.2 2024-08-13 06:05:39,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2023890.0, ans=0.125 2024-08-13 06:05:39,834 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.67 vs. limit=22.5 2024-08-13 06:05:41,539 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 14000, loss[loss=0.1033, beats_loss=0.01011, ecapa_loss=0.0001742, whisper_loss=0.09148, over 22788.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01091, ecapa_loss=0.0001649, whisper_loss=0.09195, over 3879336.84 frames. ], batch size: 91, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:05:43,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2023990.0, ans=0.2 2024-08-13 06:05:46,611 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2024-08-13 06:05:50,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2023990.0, ans=0.1 2024-08-13 06:05:52,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2023990.0, ans=0.1 2024-08-13 06:06:27,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2024290.0, ans=0.0 2024-08-13 06:06:28,606 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 06:06:32,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.439e+01 2.688e+01 3.210e+01 4.383e+01, threshold=5.377e+01, percent-clipped=0.0 2024-08-13 06:06:46,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2024390.0, ans=0.1 2024-08-13 06:06:48,684 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.55 vs. limit=22.5 2024-08-13 06:06:50,557 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 14050, loss[loss=0.06658, beats_loss=0.01511, ecapa_loss=0.0001251, whisper_loss=0.05022, over 14767.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01097, ecapa_loss=0.000165, whisper_loss=0.09147, over 3881593.88 frames. ], batch size: 59, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:06:55,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2024490.0, ans=0.125 2024-08-13 06:07:01,567 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 06:07:09,486 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.22 vs. limit=22.5 2024-08-13 06:07:19,702 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 06:07:25,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2024690.0, ans=0.0 2024-08-13 06:07:47,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2024890.0, ans=0.1 2024-08-13 06:07:52,617 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.62 vs. limit=15.0 2024-08-13 06:07:59,612 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 14100, loss[loss=0.1222, beats_loss=0.01037, ecapa_loss=0.0001617, whisper_loss=0.1102, over 23613.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01096, ecapa_loss=0.0001648, whisper_loss=0.09173, over 3905241.79 frames. ], batch size: 93, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:08:03,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2024990.0, ans=0.1 2024-08-13 06:08:08,728 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-13 06:08:24,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2025090.0, ans=0.125 2024-08-13 06:08:25,914 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-13 06:08:28,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2025190.0, ans=10.0 2024-08-13 06:08:37,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2025190.0, ans=0.0 2024-08-13 06:08:51,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.495e+01 2.684e+01 2.972e+01 8.600e+01, threshold=5.367e+01, percent-clipped=1.0 2024-08-13 06:08:56,526 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2024-08-13 06:09:07,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2025390.0, ans=0.0 2024-08-13 06:09:09,035 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 14150, loss[loss=0.09811, beats_loss=0.01352, ecapa_loss=0.0001643, whisper_loss=0.08294, over 18159.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01105, ecapa_loss=0.0001652, whisper_loss=0.09077, over 3891584.70 frames. ], batch size: 74, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:09:11,143 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-13 06:09:21,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2025590.0, ans=0.125 2024-08-13 06:09:23,349 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.533e-03 2024-08-13 06:09:29,145 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2024-08-13 06:09:29,863 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 06:09:35,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2025690.0, ans=0.0 2024-08-13 06:09:50,472 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 06:09:53,155 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 32 from Vox, 25 fro AS 2024-08-13 06:09:55,020 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-08-13 06:10:04,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2025890.0, ans=0.0 2024-08-13 06:10:08,423 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 06:10:14,933 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 27 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-13 06:10:17,490 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 14200, loss[loss=0.08634, beats_loss=0.01021, ecapa_loss=0.000143, whisper_loss=0.0747, over 18083.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01109, ecapa_loss=0.0001642, whisper_loss=0.09002, over 3887222.94 frames. ], batch size: 68, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:10:28,112 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 06:10:55,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2026190.0, ans=0.025 2024-08-13 06:11:02,598 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 06:11:07,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.457e+01 2.666e+01 2.949e+01 5.330e+01, threshold=5.333e+01, percent-clipped=0.0 2024-08-13 06:11:09,029 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=12.0 2024-08-13 06:11:12,313 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 33 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 06:11:15,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2026390.0, ans=0.95 2024-08-13 06:11:16,492 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 06:11:25,879 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 14250, loss[loss=0.1114, beats_loss=0.01093, ecapa_loss=0.0001609, whisper_loss=0.09887, over 23700.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01095, ecapa_loss=0.0001651, whisper_loss=0.0908, over 3897230.16 frames. ], batch size: 92, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:11:34,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2026490.0, ans=0.2 2024-08-13 06:11:40,855 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 06:11:42,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2026590.0, ans=0.125 2024-08-13 06:11:43,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2026590.0, ans=0.0 2024-08-13 06:11:50,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2026590.0, ans=0.125 2024-08-13 06:12:00,703 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.92 vs. limit=10.0 2024-08-13 06:12:05,811 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:12:05,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2026790.0, ans=0.1 2024-08-13 06:12:06,976 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-13 06:12:18,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2024-08-13 06:12:31,735 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 06:12:34,319 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 14300, loss[loss=0.1069, beats_loss=0.009231, ecapa_loss=0.0001528, whisper_loss=0.09611, over 16288.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01095, ecapa_loss=0.0001646, whisper_loss=0.09026, over 3877208.65 frames. ], batch size: 64, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:12:39,881 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-13 06:12:41,292 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 06:12:55,361 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=12.0 2024-08-13 06:12:59,042 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2024-08-13 06:13:06,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2027190.0, ans=0.125 2024-08-13 06:13:16,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2027290.0, ans=0.07 2024-08-13 06:13:22,579 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-13 06:13:24,263 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.493e+01 2.791e+01 3.138e+01 4.573e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-13 06:13:41,853 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 14350, loss[loss=0.08733, beats_loss=0.01311, ecapa_loss=0.000186, whisper_loss=0.07236, over 20998.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0109, ecapa_loss=0.000166, whisper_loss=0.09066, over 3887829.56 frames. ], batch size: 88, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:13:46,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2027490.0, ans=0.0 2024-08-13 06:13:46,594 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2024-08-13 06:13:47,535 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 13 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 06:13:54,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2027590.0, ans=0.2 2024-08-13 06:14:04,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2027590.0, ans=0.1 2024-08-13 06:14:28,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2027790.0, ans=0.0 2024-08-13 06:14:38,638 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 06:14:41,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2027890.0, ans=0.125 2024-08-13 06:14:46,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2027890.0, ans=0.1 2024-08-13 06:14:52,193 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 14400, loss[loss=0.1167, beats_loss=0.009586, ecapa_loss=0.0001863, whisper_loss=0.1052, over 23428.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01094, ecapa_loss=0.000167, whisper_loss=0.09028, over 3901909.51 frames. ], batch size: 93, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:15:01,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2027990.0, ans=0.125 2024-08-13 06:15:01,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2027990.0, ans=0.0 2024-08-13 06:15:06,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2028090.0, ans=0.2 2024-08-13 06:15:16,634 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 06:15:23,776 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 06:15:45,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2028290.0, ans=0.2 2024-08-13 06:15:46,236 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.480e+01 2.712e+01 3.054e+01 1.079e+02, threshold=5.424e+01, percent-clipped=2.0 2024-08-13 06:16:06,685 INFO [train_multi_KD3.py:1116] (1/4) Epoch 14, batch 14450, loss[loss=0.09546, beats_loss=0.01185, ecapa_loss=0.0002194, whisper_loss=0.08142, over 20549.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01096, ecapa_loss=0.0001679, whisper_loss=0.08972, over 3849796.38 frames. ], batch size: 92, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:16:07,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2028490.0, ans=0.125 2024-08-13 06:16:08,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2028490.0, ans=0.2 2024-08-13 06:16:11,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2028490.0, ans=0.0 2024-08-13 06:16:17,583 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 28 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 06:16:47,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2028690.0, ans=0.1 2024-08-13 06:17:07,843 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2024-08-13 06:17:52,542 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 0, loss[loss=0.1133, beats_loss=0.008997, ecapa_loss=0.0001741, whisper_loss=0.1025, over 23187.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.008997, ecapa_loss=0.0001741, whisper_loss=0.1025, over 23187.00 frames. ], batch size: 90, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:17:52,542 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 06:18:35,157 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005623, whisper_loss=0.2479, over 922467.00 frames. 2024-08-13 06:18:51,946 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on SV_voxceleb1: loss=0.004582, beats_loss=0, ecapa_loss=0.0004582, whisper_loss=0, over 939242.00 frames. 2024-08-13 06:20:54,422 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on AT_audioset: loss=0.02384, beats_loss=0.02384, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 06:20:54,425 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-13 06:21:23,568 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 06:21:33,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2029030.0, ans=0.0 2024-08-13 06:21:36,887 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-08-13 06:21:39,721 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2024-08-13 06:21:42,187 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 06:22:03,024 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=12.0 2024-08-13 06:22:27,360 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.323e+00 2024-08-13 06:22:30,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2029230.0, ans=0.0 2024-08-13 06:22:37,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2029230.0, ans=0.125 2024-08-13 06:22:48,918 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.538e+01 2.901e+01 3.195e+01 5.923e+01, threshold=5.802e+01, percent-clipped=1.0 2024-08-13 06:22:49,393 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 06:23:05,529 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 50, loss[loss=0.0857, beats_loss=0.01147, ecapa_loss=0.000131, whisper_loss=0.07292, over 22371.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01021, ecapa_loss=0.0001644, whisper_loss=0.08897, over 907689.05 frames. ], batch size: 86, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:23:09,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2029430.0, ans=0.125 2024-08-13 06:23:28,356 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 06:23:41,116 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 06:23:50,827 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 06:25:04,431 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 100, loss[loss=0.1137, beats_loss=0.01084, ecapa_loss=0.0001408, whisper_loss=0.1015, over 23108.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.009979, ecapa_loss=0.0001665, whisper_loss=0.08954, over 1545724.72 frames. ], batch size: 89, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:25:37,896 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 24 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 06:25:45,025 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-13 06:25:48,668 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 06:26:20,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2030230.0, ans=0.05 2024-08-13 06:26:42,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.792e+01 3.150e+01 3.564e+01 5.697e+01, threshold=6.299e+01, percent-clipped=0.0 2024-08-13 06:26:56,782 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 150, loss[loss=0.1161, beats_loss=0.01062, ecapa_loss=0.000134, whisper_loss=0.1042, over 18750.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01004, ecapa_loss=0.0001661, whisper_loss=0.09135, over 2070731.75 frames. ], batch size: 71, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:27:05,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2030430.0, ans=0.05 2024-08-13 06:27:25,582 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.48 vs. limit=22.5 2024-08-13 06:27:40,738 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-13 06:28:03,577 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 06:28:08,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2030730.0, ans=0.1 2024-08-13 06:28:27,556 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 200, loss[loss=0.1289, beats_loss=0.01038, ecapa_loss=0.0001576, whisper_loss=0.117, over 23432.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01002, ecapa_loss=0.0001671, whisper_loss=0.09242, over 2438226.58 frames. ], batch size: 90, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:28:45,929 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2024-08-13 06:29:05,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2031130.0, ans=0.0 2024-08-13 06:29:38,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2031330.0, ans=0.125 2024-08-13 06:29:39,732 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.436e+01 2.755e+01 3.099e+01 4.760e+01, threshold=5.509e+01, percent-clipped=0.0 2024-08-13 06:29:51,978 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 250, loss[loss=0.137, beats_loss=0.006664, ecapa_loss=0.0001821, whisper_loss=0.1285, over 15169.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01023, ecapa_loss=0.000168, whisper_loss=0.09231, over 2742940.64 frames. ], batch size: 57, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:30:16,751 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 06:30:39,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2031730.0, ans=0.125 2024-08-13 06:30:41,143 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 06:30:51,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2031730.0, ans=0.125 2024-08-13 06:31:03,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2031830.0, ans=0.125 2024-08-13 06:31:04,487 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 06:31:13,741 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 300, loss[loss=0.1154, beats_loss=0.00769, ecapa_loss=0.0002155, whisper_loss=0.1056, over 15706.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01029, ecapa_loss=0.0001681, whisper_loss=0.09246, over 2988420.69 frames. ], batch size: 61, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:31:32,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2032030.0, ans=0.0 2024-08-13 06:31:40,911 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2024-08-13 06:31:48,633 WARNING [optim.py:496] (1/4) Scaling gradients by 0.06791721284389496, model_norm_threshold=55.09401321411133 2024-08-13 06:31:48,832 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.98, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.429e+05, grad_sumsq=7.164e+04, orig_rms_sq=8.974e+00 2024-08-13 06:32:15,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2032230.0, ans=0.1 2024-08-13 06:32:25,764 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.443e+01 2.713e+01 2.990e+01 8.112e+02, threshold=5.427e+01, percent-clipped=1.0 2024-08-13 06:32:35,773 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 12 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 06:32:37,012 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 350, loss[loss=0.07662, beats_loss=0.0119, ecapa_loss=0.000181, whisper_loss=0.06291, over 13144.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01041, ecapa_loss=0.0001659, whisper_loss=0.0923, over 3193379.26 frames. ], batch size: 53, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:33:35,530 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 06:33:42,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2032830.0, ans=0.2 2024-08-13 06:33:57,744 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 400, loss[loss=0.1168, beats_loss=0.009759, ecapa_loss=0.0001182, whisper_loss=0.1058, over 16118.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01048, ecapa_loss=0.0001643, whisper_loss=0.09211, over 3331105.44 frames. ], batch size: 57, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:34:03,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2032930.0, ans=0.05 2024-08-13 06:34:24,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2033030.0, ans=0.0 2024-08-13 06:34:44,439 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 30 from Vox, 24 fro AS 2024-08-13 06:35:02,559 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 06:35:04,059 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 06:35:06,745 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.548e+01 2.826e+01 3.113e+01 9.410e+01, threshold=5.653e+01, percent-clipped=3.0 2024-08-13 06:35:11,585 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 06:35:13,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2033330.0, ans=0.0 2024-08-13 06:35:17,993 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 450, loss[loss=0.104, beats_loss=0.009154, ecapa_loss=0.0002023, whisper_loss=0.09283, over 21414.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01056, ecapa_loss=0.0001653, whisper_loss=0.09182, over 3450782.74 frames. ], batch size: 90, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:35:32,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2033530.0, ans=0.125 2024-08-13 06:35:39,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2033530.0, ans=0.0 2024-08-13 06:35:55,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2033630.0, ans=0.125 2024-08-13 06:36:05,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2033730.0, ans=0.09899494936611666 2024-08-13 06:36:26,879 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2024-08-13 06:36:37,348 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 500, loss[loss=0.1038, beats_loss=0.01056, ecapa_loss=0.0001516, whisper_loss=0.09172, over 18122.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01057, ecapa_loss=0.0001648, whisper_loss=0.09163, over 3547692.02 frames. ], batch size: 70, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:36:43,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2033930.0, ans=0.125 2024-08-13 06:36:51,352 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 25 from LS+wenet, 12 from Vox, 16 fro AS 2024-08-13 06:36:57,186 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.18 vs. limit=15.0 2024-08-13 06:37:06,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2034130.0, ans=0.125 2024-08-13 06:37:15,551 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-13 06:37:16,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2034130.0, ans=0.04949747468305833 2024-08-13 06:37:18,909 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:37:34,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2034230.0, ans=0.0 2024-08-13 06:37:44,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2034330.0, ans=0.05 2024-08-13 06:37:45,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.386e+01 2.704e+01 2.981e+01 6.756e+01, threshold=5.408e+01, percent-clipped=1.0 2024-08-13 06:37:56,434 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 550, loss[loss=0.1041, beats_loss=0.01131, ecapa_loss=0.0001652, whisper_loss=0.09117, over 19254.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01058, ecapa_loss=0.0001636, whisper_loss=0.09116, over 3600251.51 frames. ], batch size: 76, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:38:11,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2034530.0, ans=0.0 2024-08-13 06:38:12,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2034530.0, ans=0.125 2024-08-13 06:38:31,108 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 06:38:36,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2034630.0, ans=0.0 2024-08-13 06:38:36,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2034630.0, ans=0.2 2024-08-13 06:38:37,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2034630.0, ans=0.1 2024-08-13 06:38:41,190 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=15.0 2024-08-13 06:39:11,162 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 22 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 06:39:17,194 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 600, loss[loss=0.1159, beats_loss=0.00884, ecapa_loss=0.000173, whisper_loss=0.1054, over 16273.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001624, whisper_loss=0.09117, over 3663533.69 frames. ], batch size: 62, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:39:28,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2034930.0, ans=0.125 2024-08-13 06:39:39,193 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 06:39:44,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2035030.0, ans=0.125 2024-08-13 06:39:48,160 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2024-08-13 06:39:55,054 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 06:39:57,452 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 06:40:03,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2035230.0, ans=0.1 2024-08-13 06:40:18,514 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2024-08-13 06:40:25,051 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.17 vs. limit=6.0 2024-08-13 06:40:25,647 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.434e+01 2.721e+01 3.072e+01 6.546e+01, threshold=5.441e+01, percent-clipped=1.0 2024-08-13 06:40:32,867 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 06:40:37,792 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 650, loss[loss=0.1016, beats_loss=0.01135, ecapa_loss=0.0001617, whisper_loss=0.08867, over 18922.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01067, ecapa_loss=0.0001633, whisper_loss=0.09125, over 3700918.64 frames. ], batch size: 75, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:40:54,445 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.17 vs. limit=15.0 2024-08-13 06:41:02,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2035530.0, ans=0.125 2024-08-13 06:41:02,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2035530.0, ans=0.1 2024-08-13 06:41:14,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2035630.0, ans=0.025 2024-08-13 06:41:18,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2035630.0, ans=0.0 2024-08-13 06:41:30,364 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2024-08-13 06:41:33,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2035730.0, ans=0.125 2024-08-13 06:41:36,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2035730.0, ans=0.1 2024-08-13 06:41:36,305 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.13 vs. limit=22.5 2024-08-13 06:41:53,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2035830.0, ans=0.125 2024-08-13 06:41:59,050 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 700, loss[loss=0.118, beats_loss=0.008341, ecapa_loss=0.0002113, whisper_loss=0.1075, over 18297.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01067, ecapa_loss=0.0001628, whisper_loss=0.09132, over 3715509.29 frames. ], batch size: 74, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:42:03,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.64 vs. limit=22.5 2024-08-13 06:42:08,092 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 06:42:09,441 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 06:42:17,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2036030.0, ans=0.125 2024-08-13 06:42:44,160 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2024-08-13 06:43:04,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-08-13 06:43:05,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2036330.0, ans=0.125 2024-08-13 06:43:08,492 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.352e+01 2.612e+01 3.001e+01 5.116e+01, threshold=5.224e+01, percent-clipped=0.0 2024-08-13 06:43:19,770 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 750, loss[loss=0.08214, beats_loss=0.01147, ecapa_loss=0.0001979, whisper_loss=0.06868, over 16639.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01058, ecapa_loss=0.0001626, whisper_loss=0.09147, over 3741874.86 frames. ], batch size: 71, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:43:30,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2036430.0, ans=0.1 2024-08-13 06:43:34,702 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 21 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-13 06:43:53,356 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 28 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-13 06:43:57,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2036630.0, ans=0.0 2024-08-13 06:44:08,889 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 06:44:23,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2036830.0, ans=0.2 2024-08-13 06:44:31,327 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-13 06:44:37,084 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 800, loss[loss=0.07655, beats_loss=0.01212, ecapa_loss=0.0001314, whisper_loss=0.06312, over 17064.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.000163, whisper_loss=0.09056, over 3746483.75 frames. ], batch size: 65, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:44:53,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2037030.0, ans=0.0 2024-08-13 06:44:59,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2037030.0, ans=0.0 2024-08-13 06:45:27,003 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 06:45:27,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2037230.0, ans=0.125 2024-08-13 06:45:28,816 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 06:45:32,186 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 06:45:43,218 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.386e+01 2.631e+01 2.954e+01 1.989e+02, threshold=5.262e+01, percent-clipped=2.0 2024-08-13 06:45:43,459 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-13 06:45:48,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2037330.0, ans=0.125 2024-08-13 06:45:53,637 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 850, loss[loss=0.08888, beats_loss=0.01377, ecapa_loss=0.0001504, whisper_loss=0.07361, over 18635.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001634, whisper_loss=0.08982, over 3796826.43 frames. ], batch size: 76, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:46:00,169 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 06:46:01,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2037430.0, ans=0.125 2024-08-13 06:46:16,477 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.40 vs. limit=22.5 2024-08-13 06:46:25,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2037630.0, ans=0.2 2024-08-13 06:46:43,653 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-13 06:46:51,742 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 06:46:58,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2037830.0, ans=0.025 2024-08-13 06:47:10,176 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 900, loss[loss=0.09555, beats_loss=0.01134, ecapa_loss=0.0001441, whisper_loss=0.08277, over 22806.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01071, ecapa_loss=0.0001619, whisper_loss=0.0899, over 3814793.84 frames. ], batch size: 91, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:47:12,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2037930.0, ans=0.125 2024-08-13 06:47:24,950 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 06:47:57,747 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 06:48:05,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.65 vs. limit=10.0 2024-08-13 06:48:12,456 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.393e+01 2.649e+01 3.126e+01 8.192e+01, threshold=5.298e+01, percent-clipped=1.0 2024-08-13 06:48:17,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2038330.0, ans=0.125 2024-08-13 06:48:22,630 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 950, loss[loss=0.08615, beats_loss=0.01534, ecapa_loss=0.0001542, whisper_loss=0.06926, over 15646.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01076, ecapa_loss=0.0001613, whisper_loss=0.08925, over 3812907.47 frames. ], batch size: 63, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:48:26,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2038430.0, ans=0.07 2024-08-13 06:48:39,522 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:48:49,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2038530.0, ans=0.0 2024-08-13 06:49:18,958 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 06:49:19,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2038730.0, ans=0.125 2024-08-13 06:49:31,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2038830.0, ans=0.125 2024-08-13 06:49:32,370 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 06:49:35,484 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-13 06:49:44,435 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1000, loss[loss=0.1017, beats_loss=0.01141, ecapa_loss=0.0001531, whisper_loss=0.08872, over 22811.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01077, ecapa_loss=0.0001617, whisper_loss=0.08963, over 3788781.31 frames. ], batch size: 90, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:49:46,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2038930.0, ans=0.0 2024-08-13 06:49:50,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2038930.0, ans=0.07 2024-08-13 06:49:51,514 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.967e-02 2024-08-13 06:49:54,749 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-13 06:50:00,190 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 06:50:08,261 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.56 vs. limit=15.0 2024-08-13 06:50:09,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2039030.0, ans=0.1 2024-08-13 06:50:16,514 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-08-13 06:50:26,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2039130.0, ans=0.0 2024-08-13 06:50:46,321 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=15.0 2024-08-13 06:50:50,286 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 06:50:55,675 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.424e+01 2.734e+01 3.160e+01 9.771e+01, threshold=5.467e+01, percent-clipped=3.0 2024-08-13 06:51:05,540 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1050, loss[loss=0.09541, beats_loss=0.009669, ecapa_loss=0.0001972, whisper_loss=0.08376, over 15240.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01076, ecapa_loss=0.000161, whisper_loss=0.08969, over 3820324.71 frames. ], batch size: 62, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:51:26,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2039530.0, ans=0.0 2024-08-13 06:51:29,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2039530.0, ans=0.125 2024-08-13 06:51:37,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2039630.0, ans=0.2 2024-08-13 06:51:52,072 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 28 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 06:52:13,932 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 30 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 06:52:20,563 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1100, loss[loss=0.1132, beats_loss=0.01005, ecapa_loss=0.0001353, whisper_loss=0.1018, over 23808.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01083, ecapa_loss=0.0001603, whisper_loss=0.08974, over 3853004.91 frames. ], batch size: 89, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:52:32,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-08-13 06:52:36,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2040030.0, ans=0.125 2024-08-13 06:52:40,576 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.98 vs. limit=10.0 2024-08-13 06:52:44,084 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 06:52:54,030 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=12.0 2024-08-13 06:52:56,037 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 06:52:57,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2040130.0, ans=0.0 2024-08-13 06:53:05,673 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 06:53:09,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2040230.0, ans=0.5 2024-08-13 06:53:24,794 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.366e+01 2.661e+01 3.055e+01 5.230e+01, threshold=5.322e+01, percent-clipped=0.0 2024-08-13 06:53:32,926 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1150, loss[loss=0.08686, beats_loss=0.01367, ecapa_loss=0.0001237, whisper_loss=0.07195, over 22884.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01081, ecapa_loss=0.0001602, whisper_loss=0.09031, over 3850686.68 frames. ], batch size: 91, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:53:46,970 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.88 vs. limit=6.0 2024-08-13 06:53:50,060 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.33 vs. limit=22.5 2024-08-13 06:53:53,743 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 06:53:56,590 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 06:54:07,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2040630.0, ans=0.125 2024-08-13 06:54:07,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2040630.0, ans=0.025 2024-08-13 06:54:15,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2040730.0, ans=0.0 2024-08-13 06:54:16,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2040730.0, ans=0.125 2024-08-13 06:54:20,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2040730.0, ans=0.125 2024-08-13 06:54:31,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2040830.0, ans=0.0 2024-08-13 06:54:31,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2040830.0, ans=0.125 2024-08-13 06:54:31,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2040830.0, ans=0.125 2024-08-13 06:54:35,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2040830.0, ans=0.125 2024-08-13 06:54:40,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2040830.0, ans=0.035 2024-08-13 06:54:44,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2040930.0, ans=0.1 2024-08-13 06:54:45,001 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1200, loss[loss=0.1064, beats_loss=0.008776, ecapa_loss=0.0001762, whisper_loss=0.09586, over 14782.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01076, ecapa_loss=0.000161, whisper_loss=0.09034, over 3815766.76 frames. ], batch size: 59, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:55:01,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2041030.0, ans=0.125 2024-08-13 06:55:03,008 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-13 06:55:22,144 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 06:55:32,351 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 06:55:38,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2041230.0, ans=0.0 2024-08-13 06:55:38,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2041230.0, ans=0.2 2024-08-13 06:55:42,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2041330.0, ans=0.0 2024-08-13 06:55:46,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.369e+01 2.676e+01 3.078e+01 7.518e+01, threshold=5.351e+01, percent-clipped=1.0 2024-08-13 06:55:53,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2041430.0, ans=0.125 2024-08-13 06:55:54,710 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1250, loss[loss=0.1047, beats_loss=0.01307, ecapa_loss=0.000142, whisper_loss=0.09023, over 21517.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01088, ecapa_loss=0.0001587, whisper_loss=0.08942, over 3812853.96 frames. ], batch size: 85, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:56:00,023 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 22 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-13 06:56:09,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2041530.0, ans=0.1 2024-08-13 06:56:11,933 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 24 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-13 06:56:12,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2041530.0, ans=0.125 2024-08-13 06:56:18,066 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-13 06:57:01,248 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1300, loss[loss=0.1104, beats_loss=0.009947, ecapa_loss=0.0001341, whisper_loss=0.09915, over 20199.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001599, whisper_loss=0.09085, over 3817475.87 frames. ], batch size: 74, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:57:08,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2041930.0, ans=0.125 2024-08-13 06:57:09,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2041930.0, ans=0.125 2024-08-13 06:57:10,711 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 06:57:26,610 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 16 from Vox, 52 fro AS 2024-08-13 06:57:32,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2042130.0, ans=0.1 2024-08-13 06:57:32,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2042130.0, ans=0.1 2024-08-13 06:57:33,030 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 06:57:33,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2042130.0, ans=0.09899494936611666 2024-08-13 06:57:58,090 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 06:57:59,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.323e+01 2.630e+01 3.145e+01 6.794e+01, threshold=5.259e+01, percent-clipped=2.0 2024-08-13 06:58:01,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2042330.0, ans=0.0 2024-08-13 06:58:01,277 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=119.36 vs. limit=22.5 2024-08-13 06:58:06,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2042430.0, ans=0.125 2024-08-13 06:58:07,137 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1350, loss[loss=0.06656, beats_loss=0.01428, ecapa_loss=0.000143, whisper_loss=0.05085, over 14371.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01081, ecapa_loss=0.0001597, whisper_loss=0.09022, over 3829054.64 frames. ], batch size: 60, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:58:14,937 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 17 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 06:58:44,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2042630.0, ans=0.125 2024-08-13 06:58:57,261 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 06:59:04,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2042830.0, ans=0.0 2024-08-13 06:59:07,921 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 06:59:13,146 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1400, loss[loss=0.1008, beats_loss=0.009212, ecapa_loss=0.0002157, whisper_loss=0.0894, over 16460.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01076, ecapa_loss=0.0001603, whisper_loss=0.09055, over 3827051.05 frames. ], batch size: 68, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:59:13,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2042930.0, ans=0.1 2024-08-13 06:59:17,566 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-13 06:59:21,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2042930.0, ans=0.1 2024-08-13 06:59:28,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2043030.0, ans=0.0 2024-08-13 06:59:46,083 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 22 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 06:59:50,244 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 30 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-13 06:59:58,204 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 24 from LS+wenet, 21 from Vox, 15 fro AS 2024-08-13 07:00:05,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2043230.0, ans=0.05 2024-08-13 07:00:07,475 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 32 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 07:00:09,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2043330.0, ans=0.95 2024-08-13 07:00:12,634 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.355e+01 2.665e+01 2.989e+01 4.736e+01, threshold=5.330e+01, percent-clipped=0.0 2024-08-13 07:00:19,801 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.298e+01 2024-08-13 07:00:20,720 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1450, loss[loss=0.1014, beats_loss=0.01155, ecapa_loss=0.0001425, whisper_loss=0.08843, over 17661.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01081, ecapa_loss=0.0001603, whisper_loss=0.09008, over 3822714.74 frames. ], batch size: 67, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:00:20,922 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 16 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 07:00:55,225 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 07:00:58,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=2043530.0, ans=15.0 2024-08-13 07:01:07,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2043530.0, ans=0.0 2024-08-13 07:01:10,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2024-08-13 07:01:16,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2043630.0, ans=0.1 2024-08-13 07:01:26,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2043730.0, ans=0.1 2024-08-13 07:01:40,829 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 07:01:42,108 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 22 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-13 07:01:46,271 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 19 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 07:01:51,744 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1500, loss[loss=0.118, beats_loss=0.008951, ecapa_loss=0.0001735, whisper_loss=0.1073, over 23082.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01087, ecapa_loss=0.0001598, whisper_loss=0.08904, over 3810784.92 frames. ], batch size: 89, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:02:07,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2044030.0, ans=0.2 2024-08-13 07:02:10,683 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 07:02:18,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2044130.0, ans=0.2 2024-08-13 07:02:30,133 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.81 vs. limit=10.0 2024-08-13 07:02:32,078 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 07:02:37,361 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2024-08-13 07:02:40,170 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 11 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-13 07:02:51,409 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.422e+01 2.612e+01 2.997e+01 7.275e+01, threshold=5.223e+01, percent-clipped=1.0 2024-08-13 07:02:53,743 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.74 vs. limit=10.0 2024-08-13 07:02:59,617 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1550, loss[loss=0.08432, beats_loss=0.01364, ecapa_loss=0.0001017, whisper_loss=0.06966, over 14959.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01092, ecapa_loss=0.0001588, whisper_loss=0.0887, over 3817636.82 frames. ], batch size: 57, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:03:20,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2044530.0, ans=0.125 2024-08-13 07:03:33,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2044630.0, ans=0.5 2024-08-13 07:03:36,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2044630.0, ans=0.1 2024-08-13 07:03:42,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2044730.0, ans=0.125 2024-08-13 07:04:00,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2044830.0, ans=0.1 2024-08-13 07:04:03,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2044830.0, ans=0.2 2024-08-13 07:04:09,485 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1600, loss[loss=0.1026, beats_loss=0.0107, ecapa_loss=0.0002114, whisper_loss=0.08975, over 13752.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01093, ecapa_loss=0.0001596, whisper_loss=0.08914, over 3844393.16 frames. ], batch size: 55, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:04:14,163 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-13 07:04:23,196 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=22.5 2024-08-13 07:04:32,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2045030.0, ans=0.2 2024-08-13 07:04:35,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2045030.0, ans=0.025 2024-08-13 07:04:38,495 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2024-08-13 07:04:41,146 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.10 vs. limit=10.0 2024-08-13 07:04:43,409 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 07:04:49,109 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2024-08-13 07:04:58,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2045230.0, ans=0.0 2024-08-13 07:05:10,973 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.418e+01 2.670e+01 2.986e+01 1.271e+02, threshold=5.339e+01, percent-clipped=4.0 2024-08-13 07:05:13,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2045330.0, ans=0.125 2024-08-13 07:05:19,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2045430.0, ans=0.035 2024-08-13 07:05:20,003 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1650, loss[loss=0.08543, beats_loss=0.01034, ecapa_loss=0.0001683, whisper_loss=0.07341, over 18666.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01085, ecapa_loss=0.0001602, whisper_loss=0.08982, over 3857270.59 frames. ], batch size: 75, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:05:21,528 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 07:05:26,774 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 23 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 07:05:29,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2045430.0, ans=0.125 2024-08-13 07:05:41,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2045530.0, ans=0.0 2024-08-13 07:05:44,970 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 07:05:55,147 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 07:06:00,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2045730.0, ans=0.0 2024-08-13 07:06:12,059 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 07:06:29,144 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1700, loss[loss=0.1177, beats_loss=0.009796, ecapa_loss=0.0001467, whisper_loss=0.1064, over 23107.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001606, whisper_loss=0.09076, over 3825963.21 frames. ], batch size: 88, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:06:35,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2045930.0, ans=0.125 2024-08-13 07:06:54,985 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 29 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-13 07:07:04,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2046130.0, ans=0.0 2024-08-13 07:07:31,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.428e+01 2.659e+01 3.089e+01 1.627e+02, threshold=5.319e+01, percent-clipped=1.0 2024-08-13 07:07:39,818 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1750, loss[loss=0.08498, beats_loss=0.01165, ecapa_loss=0.0001769, whisper_loss=0.07156, over 15170.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0106, ecapa_loss=0.0001616, whisper_loss=0.09182, over 3829038.38 frames. ], batch size: 62, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:08:09,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2046630.0, ans=0.0 2024-08-13 07:08:09,927 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=12.0 2024-08-13 07:08:25,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2046730.0, ans=0.125 2024-08-13 07:08:28,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=2046730.0, ans=0.02 2024-08-13 07:08:30,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2046730.0, ans=0.07 2024-08-13 07:08:49,380 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1800, loss[loss=0.09378, beats_loss=0.01085, ecapa_loss=0.0001419, whisper_loss=0.08151, over 17157.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01054, ecapa_loss=0.0001617, whisper_loss=0.09193, over 3822097.13 frames. ], batch size: 66, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:09:10,183 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-13 07:09:17,607 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 28 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 07:09:20,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2047130.0, ans=0.2 2024-08-13 07:09:21,735 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 07:09:30,327 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.814e+01 2024-08-13 07:09:36,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2047230.0, ans=0.125 2024-08-13 07:09:38,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2047230.0, ans=0.0 2024-08-13 07:09:48,845 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 07:09:51,695 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.458e+01 2.695e+01 3.131e+01 5.479e+01, threshold=5.391e+01, percent-clipped=1.0 2024-08-13 07:09:59,917 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1850, loss[loss=0.1154, beats_loss=0.009837, ecapa_loss=0.0001573, whisper_loss=0.104, over 22600.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01055, ecapa_loss=0.0001613, whisper_loss=0.09191, over 3821498.17 frames. ], batch size: 88, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:10:01,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2047430.0, ans=0.125 2024-08-13 07:10:04,680 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 07:10:08,472 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 07:10:18,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2047530.0, ans=0.125 2024-08-13 07:10:27,497 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 19 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-13 07:10:39,483 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 21 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 07:10:46,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2047730.0, ans=0.1 2024-08-13 07:10:49,349 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-13 07:10:53,516 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 26 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 07:11:02,326 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.31 vs. limit=22.5 2024-08-13 07:11:08,065 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1900, loss[loss=0.06785, beats_loss=0.01388, ecapa_loss=0.0001584, whisper_loss=0.05238, over 19895.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001623, whisper_loss=0.09101, over 3801342.53 frames. ], batch size: 81, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:11:17,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2047930.0, ans=0.125 2024-08-13 07:11:22,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2048030.0, ans=0.2 2024-08-13 07:11:31,007 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.134e-02 2024-08-13 07:11:44,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2048130.0, ans=0.125 2024-08-13 07:11:46,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2048130.0, ans=0.1 2024-08-13 07:11:54,562 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-08-13 07:12:02,935 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.69 vs. limit=8.0 2024-08-13 07:12:09,316 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.366e+01 2.636e+01 3.036e+01 8.197e+01, threshold=5.272e+01, percent-clipped=3.0 2024-08-13 07:12:09,566 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 07:12:09,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2048330.0, ans=0.5 2024-08-13 07:12:14,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2048330.0, ans=0.2 2024-08-13 07:12:17,858 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 1950, loss[loss=0.08941, beats_loss=0.01173, ecapa_loss=0.0001558, whisper_loss=0.07612, over 23099.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001623, whisper_loss=0.09086, over 3812650.62 frames. ], batch size: 95, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:12:23,734 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 20 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-13 07:12:26,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2048430.0, ans=0.0 2024-08-13 07:12:30,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2024-08-13 07:12:35,642 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 11 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 07:12:37,262 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 07:12:43,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2048530.0, ans=22.5 2024-08-13 07:12:53,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2048630.0, ans=0.125 2024-08-13 07:13:13,621 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 07:13:16,344 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 07:13:18,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2048830.0, ans=0.5 2024-08-13 07:13:33,749 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2000, loss[loss=0.1146, beats_loss=0.01035, ecapa_loss=0.0001375, whisper_loss=0.1029, over 22879.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001626, whisper_loss=0.09068, over 3804775.81 frames. ], batch size: 90, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:13:40,595 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-13 07:14:11,221 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 07:14:25,508 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 07:14:25,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2049230.0, ans=0.0 2024-08-13 07:14:42,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.431e+01 2.683e+01 2.951e+01 6.273e+01, threshold=5.366e+01, percent-clipped=2.0 2024-08-13 07:14:50,336 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 20 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 07:14:51,645 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2050, loss[loss=0.08772, beats_loss=0.01221, ecapa_loss=0.0001592, whisper_loss=0.07391, over 21683.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001617, whisper_loss=0.09063, over 3826013.82 frames. ], batch size: 87, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:15:59,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2049830.0, ans=0.125 2024-08-13 07:16:03,107 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.869e+05 2024-08-13 07:16:08,707 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2100, loss[loss=0.1069, beats_loss=0.01026, ecapa_loss=0.0001579, whisper_loss=0.09511, over 23208.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0108, ecapa_loss=0.0001608, whisper_loss=0.08937, over 3778363.00 frames. ], batch size: 92, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:16:22,614 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2024-08-13 07:16:25,907 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.85 vs. limit=15.0 2024-08-13 07:16:34,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2050030.0, ans=0.2 2024-08-13 07:16:49,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2050130.0, ans=0.0 2024-08-13 07:16:53,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2050230.0, ans=0.0 2024-08-13 07:16:53,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2050230.0, ans=0.2 2024-08-13 07:17:13,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2050330.0, ans=0.2 2024-08-13 07:17:15,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.374e+01 2.616e+01 2.948e+01 7.626e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-13 07:17:24,456 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2150, loss[loss=0.1111, beats_loss=0.01106, ecapa_loss=0.0001619, whisper_loss=0.09843, over 22413.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01082, ecapa_loss=0.0001618, whisper_loss=0.08998, over 3825276.00 frames. ], batch size: 90, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:17:26,057 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-13 07:17:39,214 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 14 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 07:17:52,301 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 07:17:52,860 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2024-08-13 07:17:53,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2050630.0, ans=0.1 2024-08-13 07:18:09,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2050730.0, ans=0.0 2024-08-13 07:18:18,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2050730.0, ans=0.125 2024-08-13 07:18:23,146 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 37 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 07:18:37,853 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2200, loss[loss=0.1011, beats_loss=0.01275, ecapa_loss=0.0001412, whisper_loss=0.08694, over 19755.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01086, ecapa_loss=0.0001604, whisper_loss=0.09051, over 3834267.76 frames. ], batch size: 79, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:18:42,651 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 07:18:50,299 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 16 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 07:19:09,265 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 07:19:14,013 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 07:19:24,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2051230.0, ans=0.0 2024-08-13 07:19:34,240 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-13 07:19:43,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.405e+01 2.692e+01 3.101e+01 3.996e+01, threshold=5.385e+01, percent-clipped=0.0 2024-08-13 07:19:52,994 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2250, loss[loss=0.1186, beats_loss=0.01178, ecapa_loss=0.0001609, whisper_loss=0.1052, over 22766.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01092, ecapa_loss=0.0001613, whisper_loss=0.0907, over 3836040.88 frames. ], batch size: 90, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:19:53,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2051430.0, ans=0.0 2024-08-13 07:19:57,777 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 07:20:10,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2051530.0, ans=0.125 2024-08-13 07:20:23,268 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 07:20:31,162 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 07:20:34,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2051630.0, ans=0.0 2024-08-13 07:20:35,881 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 21 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 07:20:39,068 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 21 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-13 07:20:56,547 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 20 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-13 07:20:59,038 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 29 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 07:21:07,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2051830.0, ans=0.0 2024-08-13 07:21:10,956 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2300, loss[loss=0.1055, beats_loss=0.009637, ecapa_loss=0.0001808, whisper_loss=0.09405, over 21090.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01095, ecapa_loss=0.0001622, whisper_loss=0.09119, over 3858999.20 frames. ], batch size: 86, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:21:13,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2051930.0, ans=0.1 2024-08-13 07:21:22,433 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 07:21:58,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2052230.0, ans=0.05 2024-08-13 07:22:08,543 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 07:22:09,944 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 07:22:20,170 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.551e+01 2.802e+01 3.286e+01 4.961e+01, threshold=5.604e+01, percent-clipped=0.0 2024-08-13 07:22:23,713 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.33 vs. limit=10.0 2024-08-13 07:22:27,881 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2350, loss[loss=0.1154, beats_loss=0.00892, ecapa_loss=0.0001523, whisper_loss=0.1049, over 21885.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01079, ecapa_loss=0.000163, whisper_loss=0.09215, over 3830722.98 frames. ], batch size: 84, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:22:32,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2052430.0, ans=0.0 2024-08-13 07:22:35,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2052430.0, ans=0.5 2024-08-13 07:22:36,832 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 07:22:44,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2052530.0, ans=0.0 2024-08-13 07:22:58,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2052630.0, ans=0.0 2024-08-13 07:23:00,377 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-13 07:23:35,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2052830.0, ans=0.125 2024-08-13 07:23:43,130 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2400, loss[loss=0.08742, beats_loss=0.009933, ecapa_loss=0.0001607, whisper_loss=0.07587, over 15287.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01081, ecapa_loss=0.0001617, whisper_loss=0.09166, over 3817444.18 frames. ], batch size: 61, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:23:55,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2052930.0, ans=0.025 2024-08-13 07:23:58,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2053030.0, ans=0.2 2024-08-13 07:24:21,840 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 07:24:33,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2053230.0, ans=0.07 2024-08-13 07:24:39,521 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 07:24:39,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2053230.0, ans=0.125 2024-08-13 07:24:41,517 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2024-08-13 07:24:49,183 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 24 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 07:24:52,571 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.349e+01 2.648e+01 3.316e+01 5.305e+01, threshold=5.296e+01, percent-clipped=0.0 2024-08-13 07:24:59,238 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 07:25:00,252 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2450, loss[loss=0.09689, beats_loss=0.0114, ecapa_loss=0.0001369, whisper_loss=0.08412, over 19029.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01085, ecapa_loss=0.0001615, whisper_loss=0.09102, over 3843279.79 frames. ], batch size: 75, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:25:02,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2053430.0, ans=0.125 2024-08-13 07:25:05,629 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-08-13 07:25:11,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2053430.0, ans=0.0 2024-08-13 07:25:12,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2053430.0, ans=0.125 2024-08-13 07:25:17,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=2053530.0, ans=6.0 2024-08-13 07:25:33,151 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:25:35,115 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.23 vs. limit=22.5 2024-08-13 07:25:44,382 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 31 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 07:25:59,380 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 07:26:00,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2053830.0, ans=0.0 2024-08-13 07:26:01,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2053830.0, ans=0.125 2024-08-13 07:26:02,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2053830.0, ans=0.0 2024-08-13 07:26:16,241 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2500, loss[loss=0.1017, beats_loss=0.008855, ecapa_loss=0.0001537, whisper_loss=0.09134, over 17805.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01089, ecapa_loss=0.0001614, whisper_loss=0.09032, over 3857779.42 frames. ], batch size: 70, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:26:27,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2053930.0, ans=0.125 2024-08-13 07:26:37,651 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 07:26:50,525 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 07:27:01,824 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 07:27:11,086 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 07:27:25,267 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.432e+01 2.694e+01 2.986e+01 7.508e+01, threshold=5.387e+01, percent-clipped=1.0 2024-08-13 07:27:28,838 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 07:27:29,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2054330.0, ans=0.0 2024-08-13 07:27:33,208 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2550, loss[loss=0.1296, beats_loss=0.008014, ecapa_loss=0.0001823, whisper_loss=0.1198, over 16814.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.000162, whisper_loss=0.09119, over 3867518.56 frames. ], batch size: 65, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:27:37,343 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2024-08-13 07:27:41,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2054430.0, ans=0.5 2024-08-13 07:27:55,842 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-13 07:27:57,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2054530.0, ans=0.125 2024-08-13 07:28:04,957 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2024-08-13 07:28:23,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2054730.0, ans=0.2 2024-08-13 07:28:30,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=12.0 2024-08-13 07:28:43,241 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2024-08-13 07:28:47,202 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2600, loss[loss=0.1088, beats_loss=0.00963, ecapa_loss=0.000165, whisper_loss=0.09751, over 21772.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001634, whisper_loss=0.09135, over 3880637.96 frames. ], batch size: 88, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:28:47,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2054930.0, ans=0.1 2024-08-13 07:28:49,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2054930.0, ans=0.125 2024-08-13 07:28:50,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2054930.0, ans=0.125 2024-08-13 07:28:54,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2054930.0, ans=0.125 2024-08-13 07:29:18,518 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 07:29:18,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2055130.0, ans=0.125 2024-08-13 07:29:24,204 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 07:29:41,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2055230.0, ans=0.125 2024-08-13 07:29:44,316 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 15 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 07:29:45,873 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 07:29:47,360 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 07:29:52,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2055330.0, ans=0.125 2024-08-13 07:29:56,039 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.419e+01 2.702e+01 3.112e+01 4.104e+01, threshold=5.404e+01, percent-clipped=0.0 2024-08-13 07:29:57,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2055330.0, ans=0.125 2024-08-13 07:30:03,671 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2650, loss[loss=0.1003, beats_loss=0.01076, ecapa_loss=0.00013, whisper_loss=0.08819, over 18907.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01086, ecapa_loss=0.000163, whisper_loss=0.09064, over 3848704.34 frames. ], batch size: 71, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:30:21,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2055530.0, ans=0.125 2024-08-13 07:30:48,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2055730.0, ans=0.125 2024-08-13 07:31:04,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2055830.0, ans=0.125 2024-08-13 07:31:06,323 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 07:31:20,114 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2700, loss[loss=0.1159, beats_loss=0.009088, ecapa_loss=0.0001912, whisper_loss=0.1049, over 17923.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01086, ecapa_loss=0.000164, whisper_loss=0.09109, over 3858848.85 frames. ], batch size: 75, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:31:21,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2055930.0, ans=0.0 2024-08-13 07:31:36,604 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2024-08-13 07:31:45,621 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 24 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-13 07:32:05,179 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 17 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 07:32:14,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=15.0 2024-08-13 07:32:28,788 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.371e+01 2.713e+01 3.227e+01 1.003e+02, threshold=5.426e+01, percent-clipped=2.0 2024-08-13 07:32:36,809 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2750, loss[loss=0.08892, beats_loss=0.01123, ecapa_loss=0.000209, whisper_loss=0.0756, over 20596.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01078, ecapa_loss=0.0001642, whisper_loss=0.0917, over 3829018.95 frames. ], batch size: 89, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:32:37,565 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2024-08-13 07:32:38,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2056430.0, ans=0.125 2024-08-13 07:32:52,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2056530.0, ans=0.2 2024-08-13 07:32:57,294 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2024-08-13 07:33:07,957 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2024-08-13 07:33:17,227 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 07:33:51,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2056830.0, ans=0.125 2024-08-13 07:33:55,701 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2800, loss[loss=0.1011, beats_loss=0.01116, ecapa_loss=0.0001412, whisper_loss=0.08857, over 18540.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01078, ecapa_loss=0.0001635, whisper_loss=0.09148, over 3843279.69 frames. ], batch size: 71, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:34:06,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2056930.0, ans=0.1 2024-08-13 07:34:18,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2057030.0, ans=0.07 2024-08-13 07:34:22,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2057030.0, ans=0.125 2024-08-13 07:34:26,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2057130.0, ans=0.125 2024-08-13 07:34:36,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2057130.0, ans=0.125 2024-08-13 07:34:36,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2057130.0, ans=0.0 2024-08-13 07:34:49,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2057230.0, ans=0.125 2024-08-13 07:34:52,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2057230.0, ans=0.125 2024-08-13 07:34:55,227 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 07:35:08,229 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.448e+01 2.685e+01 2.951e+01 5.516e+01, threshold=5.370e+01, percent-clipped=1.0 2024-08-13 07:35:08,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2024-08-13 07:35:15,868 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2850, loss[loss=0.1037, beats_loss=0.01027, ecapa_loss=0.0001692, whisper_loss=0.09174, over 16205.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.0001627, whisper_loss=0.09117, over 3837901.91 frames. ], batch size: 64, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:36:02,847 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:36:12,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=12.0 2024-08-13 07:36:38,278 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2900, loss[loss=0.09573, beats_loss=0.01134, ecapa_loss=0.0001597, whisper_loss=0.08279, over 19938.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.0001641, whisper_loss=0.09105, over 3867161.78 frames. ], batch size: 78, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:36:38,491 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 07:36:52,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2057930.0, ans=0.1 2024-08-13 07:36:59,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2058030.0, ans=0.5 2024-08-13 07:37:15,581 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 19 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 07:37:18,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2058130.0, ans=0.125 2024-08-13 07:37:34,739 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.32 vs. limit=15.0 2024-08-13 07:37:35,638 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 21 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-13 07:37:38,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2058230.0, ans=0.1 2024-08-13 07:37:43,079 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2024-08-13 07:37:44,019 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 07:37:49,948 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.431e+01 2.692e+01 3.123e+01 5.434e+01, threshold=5.383e+01, percent-clipped=1.0 2024-08-13 07:37:56,330 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-13 07:37:58,366 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 2950, loss[loss=0.09687, beats_loss=0.01184, ecapa_loss=0.0001366, whisper_loss=0.08367, over 23229.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01085, ecapa_loss=0.0001641, whisper_loss=0.09075, over 3871250.56 frames. ], batch size: 94, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:38:08,305 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 07:38:13,155 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 07:38:15,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2058530.0, ans=0.125 2024-08-13 07:38:34,761 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 07:38:54,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2058730.0, ans=0.125 2024-08-13 07:39:02,826 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 07:39:05,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2058830.0, ans=0.0 2024-08-13 07:39:23,176 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3000, loss[loss=0.1074, beats_loss=0.01022, ecapa_loss=0.0001673, whisper_loss=0.09552, over 20462.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01085, ecapa_loss=0.0001641, whisper_loss=0.09079, over 3896000.38 frames. ], batch size: 81, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:39:23,177 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 07:40:01,999 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on ASR_libri: loss=0.2552, beats_loss=0, ecapa_loss=0.0005768, whisper_loss=0.2494, over 922467.00 frames. 2024-08-13 07:40:19,426 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on SV_voxceleb1: loss=0.00457, beats_loss=0, ecapa_loss=0.000457, whisper_loss=0, over 939242.00 frames. 2024-08-13 07:42:09,731 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on AT_audioset: loss=0.02377, beats_loss=0.02377, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 07:42:09,735 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-13 07:42:39,492 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2024-08-13 07:42:51,911 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 07:43:15,704 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2024-08-13 07:43:24,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=2059330.0, ans=0.02 2024-08-13 07:43:30,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.520e+01 2.888e+01 3.342e+01 5.667e+01, threshold=5.776e+01, percent-clipped=1.0 2024-08-13 07:43:30,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2059330.0, ans=0.95 2024-08-13 07:43:34,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2059330.0, ans=0.125 2024-08-13 07:43:38,292 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3050, loss[loss=0.123, beats_loss=0.008681, ecapa_loss=0.0001878, whisper_loss=0.1125, over 22164.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01086, ecapa_loss=0.0001647, whisper_loss=0.09114, over 3900292.20 frames. ], batch size: 84, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:43:49,748 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2024-08-13 07:44:00,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2059530.0, ans=0.04949747468305833 2024-08-13 07:44:12,278 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-13 07:44:13,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2059630.0, ans=0.05 2024-08-13 07:44:16,075 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 07:44:16,363 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:44:37,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2059730.0, ans=0.125 2024-08-13 07:44:40,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2059730.0, ans=0.0 2024-08-13 07:45:04,414 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3100, loss[loss=0.1165, beats_loss=0.0111, ecapa_loss=0.0001657, whisper_loss=0.1038, over 18368.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001657, whisper_loss=0.09176, over 3892494.26 frames. ], batch size: 72, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:45:04,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2059930.0, ans=0.0 2024-08-13 07:45:25,930 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 07:45:36,575 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2024-08-13 07:45:40,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2060130.0, ans=0.125 2024-08-13 07:45:51,176 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 07:46:12,492 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 23 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 07:46:22,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.355e+01 2.648e+01 2.914e+01 4.175e+01, threshold=5.296e+01, percent-clipped=0.0 2024-08-13 07:46:25,982 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 07:46:30,846 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3150, loss[loss=0.1124, beats_loss=0.008644, ecapa_loss=0.0002145, whisper_loss=0.1016, over 17388.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01078, ecapa_loss=0.0001674, whisper_loss=0.09196, over 3856305.33 frames. ], batch size: 67, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:46:31,064 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 07:46:31,591 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.05 vs. limit=10.0 2024-08-13 07:46:43,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2060430.0, ans=0.0 2024-08-13 07:46:45,359 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 07:46:49,660 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 07:46:57,772 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-13 07:47:05,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2060630.0, ans=0.125 2024-08-13 07:47:09,835 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 07:47:32,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2060730.0, ans=0.0 2024-08-13 07:47:37,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2060730.0, ans=0.125 2024-08-13 07:47:41,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2060830.0, ans=0.125 2024-08-13 07:47:46,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2060830.0, ans=0.125 2024-08-13 07:47:52,424 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2024-08-13 07:47:59,459 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3200, loss[loss=0.1057, beats_loss=0.01127, ecapa_loss=0.0001531, whisper_loss=0.09286, over 21449.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01083, ecapa_loss=0.0001655, whisper_loss=0.09278, over 3861210.99 frames. ], batch size: 84, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:48:07,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2060930.0, ans=0.0 2024-08-13 07:48:42,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2061130.0, ans=0.125 2024-08-13 07:48:48,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2061130.0, ans=0.125 2024-08-13 07:48:52,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2061230.0, ans=0.125 2024-08-13 07:49:17,214 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.368e+01 2.637e+01 3.039e+01 1.272e+02, threshold=5.274e+01, percent-clipped=1.0 2024-08-13 07:49:25,846 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3250, loss[loss=0.1074, beats_loss=0.006155, ecapa_loss=0.0002137, whisper_loss=0.09913, over 15031.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01082, ecapa_loss=0.000166, whisper_loss=0.09228, over 3872321.89 frames. ], batch size: 55, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:49:26,072 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 25 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-13 07:49:31,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2061430.0, ans=0.125 2024-08-13 07:50:01,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2061630.0, ans=0.1 2024-08-13 07:50:13,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2061630.0, ans=0.0 2024-08-13 07:50:23,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2061730.0, ans=0.125 2024-08-13 07:50:34,359 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-13 07:50:38,117 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2024-08-13 07:50:40,813 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 07:50:50,040 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3300, loss[loss=0.09927, beats_loss=0.01149, ecapa_loss=0.000168, whisper_loss=0.0861, over 18378.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01091, ecapa_loss=0.0001659, whisper_loss=0.0913, over 3879590.57 frames. ], batch size: 75, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:51:07,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2062030.0, ans=0.0 2024-08-13 07:51:13,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2062030.0, ans=0.0 2024-08-13 07:51:26,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2062130.0, ans=0.0 2024-08-13 07:51:47,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2062230.0, ans=0.2 2024-08-13 07:51:49,565 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 07:52:08,008 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.435e+01 2.813e+01 3.312e+01 6.245e+01, threshold=5.626e+01, percent-clipped=3.0 2024-08-13 07:52:08,479 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 07:52:15,988 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3350, loss[loss=0.08579, beats_loss=0.008637, ecapa_loss=0.0001894, whisper_loss=0.07526, over 16782.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001662, whisper_loss=0.09094, over 3881551.75 frames. ], batch size: 65, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:52:30,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2062430.0, ans=0.0 2024-08-13 07:53:12,657 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 18 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 07:53:35,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.38 vs. limit=15.0 2024-08-13 07:53:38,157 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3400, loss[loss=0.09052, beats_loss=0.01197, ecapa_loss=0.0001799, whisper_loss=0.07676, over 18894.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01091, ecapa_loss=0.0001664, whisper_loss=0.09088, over 3894574.84 frames. ], batch size: 80, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:53:59,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2063030.0, ans=0.125 2024-08-13 07:54:01,202 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 07:54:08,103 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 07:54:21,349 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2024-08-13 07:54:40,500 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.860e+05 2024-08-13 07:54:53,481 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.342e+01 2.542e+01 2.769e+01 4.852e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-13 07:54:55,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2063330.0, ans=0.025 2024-08-13 07:54:59,888 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 07:55:00,985 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3450, loss[loss=0.1101, beats_loss=0.01019, ecapa_loss=0.0001295, whisper_loss=0.09865, over 20347.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01092, ecapa_loss=0.0001668, whisper_loss=0.09078, over 3870233.68 frames. ], batch size: 76, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:55:03,554 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 20 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 07:55:05,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2063430.0, ans=0.2 2024-08-13 07:55:17,153 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 19 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 07:55:22,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2063530.0, ans=0.125 2024-08-13 07:55:26,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2063530.0, ans=0.5 2024-08-13 07:55:37,012 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.79 vs. limit=15.0 2024-08-13 07:55:52,584 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 23 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 07:55:54,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2063730.0, ans=0.125 2024-08-13 07:56:05,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2063830.0, ans=0.1 2024-08-13 07:56:13,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2063830.0, ans=0.0 2024-08-13 07:56:22,711 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3500, loss[loss=0.09238, beats_loss=0.01138, ecapa_loss=0.0001626, whisper_loss=0.07938, over 16060.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01093, ecapa_loss=0.0001677, whisper_loss=0.09044, over 3835782.24 frames. ], batch size: 66, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:56:28,381 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 25 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-13 07:56:32,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2063930.0, ans=0.125 2024-08-13 07:56:45,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2064030.0, ans=0.1 2024-08-13 07:56:48,531 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 07:56:58,450 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 07:57:01,978 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.45 vs. limit=22.5 2024-08-13 07:57:08,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2064130.0, ans=0.05 2024-08-13 07:57:11,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2064230.0, ans=0.125 2024-08-13 07:57:14,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2064230.0, ans=0.1 2024-08-13 07:57:19,880 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 31 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 07:57:26,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2064230.0, ans=0.05 2024-08-13 07:57:29,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2064330.0, ans=0.2 2024-08-13 07:57:34,544 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.63 vs. limit=12.0 2024-08-13 07:57:37,536 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.517e+01 2.798e+01 3.148e+01 5.290e+01, threshold=5.596e+01, percent-clipped=1.0 2024-08-13 07:57:45,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2064430.0, ans=0.2 2024-08-13 07:57:47,230 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3550, loss[loss=0.1328, beats_loss=0.009271, ecapa_loss=0.000131, whisper_loss=0.1223, over 15133.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001672, whisper_loss=0.09139, over 3873202.95 frames. ], batch size: 55, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:57:49,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2064430.0, ans=0.1 2024-08-13 07:58:01,842 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-13 07:58:14,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2064530.0, ans=0.1 2024-08-13 07:58:21,030 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.31 vs. limit=10.0 2024-08-13 07:58:22,060 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 07:58:41,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2064730.0, ans=0.125 2024-08-13 07:58:52,030 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 07:59:06,516 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 07:59:11,615 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3600, loss[loss=0.1133, beats_loss=0.008992, ecapa_loss=0.0001839, whisper_loss=0.1025, over 19101.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01087, ecapa_loss=0.0001666, whisper_loss=0.0915, over 3875093.67 frames. ], batch size: 75, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:59:13,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2064930.0, ans=0.2 2024-08-13 07:59:25,704 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 17 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-13 07:59:29,367 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 07:59:38,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2065030.0, ans=0.125 2024-08-13 07:59:40,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2065030.0, ans=0.0 2024-08-13 07:59:44,747 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 07:59:56,364 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 08:00:19,586 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.98 vs. limit=10.0 2024-08-13 08:00:20,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2065330.0, ans=0.125 2024-08-13 08:00:27,254 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.371e+01 2.703e+01 3.040e+01 5.839e+01, threshold=5.406e+01, percent-clipped=1.0 2024-08-13 08:00:36,741 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3650, loss[loss=0.1166, beats_loss=0.01188, ecapa_loss=0.0001717, whisper_loss=0.103, over 21695.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.0001661, whisper_loss=0.09149, over 3888822.40 frames. ], batch size: 89, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:00:39,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2065430.0, ans=0.125 2024-08-13 08:00:39,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2065430.0, ans=0.1 2024-08-13 08:00:50,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2065430.0, ans=0.125 2024-08-13 08:00:59,582 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 08:01:09,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2065530.0, ans=0.125 2024-08-13 08:01:09,650 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.22 vs. limit=15.0 2024-08-13 08:01:12,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2065630.0, ans=0.125 2024-08-13 08:01:42,378 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.35 vs. limit=22.5 2024-08-13 08:02:00,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2065930.0, ans=0.07 2024-08-13 08:02:01,029 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3700, loss[loss=0.1131, beats_loss=0.008091, ecapa_loss=0.0001798, whisper_loss=0.1032, over 17684.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01089, ecapa_loss=0.0001655, whisper_loss=0.09182, over 3861249.00 frames. ], batch size: 71, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:02:11,107 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 20 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 08:02:12,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-08-13 08:02:55,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2066230.0, ans=0.0 2024-08-13 08:03:12,498 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2024-08-13 08:03:13,137 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.360e+01 2.624e+01 2.875e+01 4.532e+01, threshold=5.249e+01, percent-clipped=0.0 2024-08-13 08:03:18,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2066330.0, ans=0.125 2024-08-13 08:03:20,780 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3750, loss[loss=0.09655, beats_loss=0.01156, ecapa_loss=0.0001443, whisper_loss=0.08355, over 15004.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.0001659, whisper_loss=0.09175, over 3833337.12 frames. ], batch size: 59, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:03:26,629 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 31 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 08:03:30,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2066430.0, ans=0.1 2024-08-13 08:03:59,448 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2024-08-13 08:04:00,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2066630.0, ans=0.1 2024-08-13 08:04:05,204 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 26 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-13 08:04:21,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2066730.0, ans=0.0 2024-08-13 08:04:23,852 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-13 08:04:34,424 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 16 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-13 08:04:43,685 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3800, loss[loss=0.09256, beats_loss=0.01117, ecapa_loss=0.0001938, whisper_loss=0.07945, over 15396.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001664, whisper_loss=0.09147, over 3849109.51 frames. ], batch size: 65, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:04:48,567 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 08:04:58,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2067030.0, ans=0.125 2024-08-13 08:04:58,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2067030.0, ans=0.0 2024-08-13 08:05:05,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2067030.0, ans=0.1 2024-08-13 08:05:32,832 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-08-13 08:05:37,234 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 08:05:41,871 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 40 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 08:05:51,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2067330.0, ans=0.05 2024-08-13 08:05:55,985 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.450e+01 2.726e+01 3.001e+01 5.077e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-13 08:05:57,892 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 22 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 08:06:03,509 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3850, loss[loss=0.08477, beats_loss=0.0115, ecapa_loss=0.0001352, whisper_loss=0.07192, over 14618.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01088, ecapa_loss=0.0001664, whisper_loss=0.09132, over 3868790.34 frames. ], batch size: 55, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:06:10,246 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.20 vs. limit=22.5 2024-08-13 08:06:16,176 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 25 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 08:06:19,873 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.33 vs. limit=22.5 2024-08-13 08:06:39,949 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2024-08-13 08:07:02,650 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=22.5 2024-08-13 08:07:08,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2067730.0, ans=0.125 2024-08-13 08:07:10,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2067730.0, ans=0.125 2024-08-13 08:07:18,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2067830.0, ans=0.07 2024-08-13 08:07:28,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2067930.0, ans=0.1 2024-08-13 08:07:29,589 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3900, loss[loss=0.1079, beats_loss=0.01001, ecapa_loss=0.0002052, whisper_loss=0.09588, over 22522.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.000167, whisper_loss=0.09143, over 3873154.88 frames. ], batch size: 95, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:07:50,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2068030.0, ans=0.2 2024-08-13 08:07:56,912 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 08:08:02,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2068130.0, ans=0.125 2024-08-13 08:08:12,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2068130.0, ans=0.125 2024-08-13 08:08:30,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2068230.0, ans=0.125 2024-08-13 08:08:34,450 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2024-08-13 08:08:43,482 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.513e+01 2.786e+01 3.251e+01 6.128e+01, threshold=5.571e+01, percent-clipped=2.0 2024-08-13 08:08:52,102 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 3950, loss[loss=0.07997, beats_loss=0.01143, ecapa_loss=0.0001693, whisper_loss=0.06685, over 20335.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01077, ecapa_loss=0.0001683, whisper_loss=0.09224, over 3892448.80 frames. ], batch size: 84, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:08:54,477 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-13 08:09:02,422 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-13 08:09:08,073 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 08:09:13,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2068530.0, ans=0.125 2024-08-13 08:09:22,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=2068530.0, ans=0.1 2024-08-13 08:09:48,416 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 34 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-13 08:09:52,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2068730.0, ans=0.1 2024-08-13 08:10:02,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2068830.0, ans=0.2 2024-08-13 08:10:09,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2068830.0, ans=0.1 2024-08-13 08:10:20,819 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4000, loss[loss=0.09437, beats_loss=0.01196, ecapa_loss=0.0001665, whisper_loss=0.08074, over 16940.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01074, ecapa_loss=0.0001685, whisper_loss=0.09225, over 3896348.51 frames. ], batch size: 67, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:10:21,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2068930.0, ans=0.125 2024-08-13 08:10:23,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2024-08-13 08:10:24,424 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 08:10:34,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2068930.0, ans=0.0 2024-08-13 08:10:42,040 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 08:10:49,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2069030.0, ans=0.125 2024-08-13 08:10:58,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2069130.0, ans=0.0 2024-08-13 08:10:58,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2069130.0, ans=0.0 2024-08-13 08:10:59,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2069130.0, ans=0.0 2024-08-13 08:11:31,905 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 08:11:34,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.601e+01 2.360e+01 2.605e+01 2.925e+01 4.033e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-13 08:11:42,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2069430.0, ans=0.0 2024-08-13 08:11:43,408 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4050, loss[loss=0.1092, beats_loss=0.01142, ecapa_loss=0.0001587, whisper_loss=0.09623, over 20410.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01078, ecapa_loss=0.0001681, whisper_loss=0.09187, over 3914715.96 frames. ], batch size: 80, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:11:44,016 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 08:11:45,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2069430.0, ans=0.125 2024-08-13 08:11:54,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2069430.0, ans=0.125 2024-08-13 08:12:00,732 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-13 08:12:11,941 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 20 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 08:12:16,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2069630.0, ans=0.125 2024-08-13 08:12:19,848 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.812e-03 2024-08-13 08:12:24,783 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 12 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 08:12:46,075 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 29 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 08:13:08,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2069930.0, ans=0.0 2024-08-13 08:13:09,743 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4100, loss[loss=0.09302, beats_loss=0.0105, ecapa_loss=0.0001826, whisper_loss=0.08069, over 21757.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01079, ecapa_loss=0.0001681, whisper_loss=0.09161, over 3909530.17 frames. ], batch size: 89, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:13:44,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2070130.0, ans=0.05 2024-08-13 08:14:02,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2024-08-13 08:14:04,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2070230.0, ans=0.1 2024-08-13 08:14:11,216 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.20 vs. limit=22.5 2024-08-13 08:14:11,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-08-13 08:14:14,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2070230.0, ans=0.07 2024-08-13 08:14:24,746 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.381e+01 2.702e+01 3.113e+01 4.589e+01, threshold=5.403e+01, percent-clipped=0.0 2024-08-13 08:14:30,861 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.46 vs. limit=22.5 2024-08-13 08:14:32,197 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 08:14:33,750 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4150, loss[loss=0.1069, beats_loss=0.01137, ecapa_loss=0.0001749, whisper_loss=0.09376, over 22490.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01083, ecapa_loss=0.0001685, whisper_loss=0.09174, over 3892177.44 frames. ], batch size: 90, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:14:34,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2070430.0, ans=0.125 2024-08-13 08:14:37,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2070430.0, ans=0.125 2024-08-13 08:14:37,761 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.30 vs. limit=15.0 2024-08-13 08:14:39,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2070430.0, ans=0.1 2024-08-13 08:14:40,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2070430.0, ans=0.125 2024-08-13 08:14:57,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2070530.0, ans=0.125 2024-08-13 08:15:18,748 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2024-08-13 08:15:21,117 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 08:15:37,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2070730.0, ans=10.0 2024-08-13 08:15:56,116 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4200, loss[loss=0.1131, beats_loss=0.01056, ecapa_loss=0.0001492, whisper_loss=0.1011, over 21836.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01092, ecapa_loss=0.0001674, whisper_loss=0.09171, over 3884418.74 frames. ], batch size: 84, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:16:03,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2070930.0, ans=0.2 2024-08-13 08:16:07,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2070930.0, ans=0.125 2024-08-13 08:16:12,134 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 14 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 08:16:26,318 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 08:17:10,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.330e+01 2.608e+01 3.052e+01 6.792e+01, threshold=5.217e+01, percent-clipped=3.0 2024-08-13 08:17:18,445 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4250, loss[loss=0.07455, beats_loss=0.01133, ecapa_loss=0.0001947, whisper_loss=0.06127, over 20243.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.011, ecapa_loss=0.0001663, whisper_loss=0.09061, over 3886514.08 frames. ], batch size: 88, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:17:33,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2071430.0, ans=0.0 2024-08-13 08:17:33,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2071430.0, ans=0.125 2024-08-13 08:17:43,963 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.31 vs. limit=22.5 2024-08-13 08:17:58,725 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 11 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-13 08:18:18,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2071730.0, ans=0.0 2024-08-13 08:18:21,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2071730.0, ans=0.125 2024-08-13 08:18:31,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2071830.0, ans=0.2 2024-08-13 08:18:37,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2071830.0, ans=0.015 2024-08-13 08:18:39,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2071930.0, ans=0.2 2024-08-13 08:18:40,856 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4300, loss[loss=0.104, beats_loss=0.01204, ecapa_loss=0.0001435, whisper_loss=0.09054, over 22710.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01098, ecapa_loss=0.000166, whisper_loss=0.09053, over 3854023.04 frames. ], batch size: 91, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:18:41,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2071930.0, ans=0.1 2024-08-13 08:19:03,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2072030.0, ans=0.125 2024-08-13 08:19:21,399 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 08:19:44,870 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 08:19:53,028 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.447e+01 2.714e+01 2.965e+01 4.296e+01, threshold=5.429e+01, percent-clipped=0.0 2024-08-13 08:20:00,814 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4350, loss[loss=0.1079, beats_loss=0.01004, ecapa_loss=0.000192, whisper_loss=0.09595, over 19653.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01098, ecapa_loss=0.0001672, whisper_loss=0.09012, over 3826043.57 frames. ], batch size: 82, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:20:12,980 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 08:20:17,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2072530.0, ans=0.1 2024-08-13 08:20:26,978 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.77 vs. limit=22.5 2024-08-13 08:20:28,204 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 08:20:32,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=12.0 2024-08-13 08:21:01,277 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 08:21:05,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2072830.0, ans=0.125 2024-08-13 08:21:11,991 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-08-13 08:21:13,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2072830.0, ans=0.05 2024-08-13 08:21:13,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2072830.0, ans=0.0 2024-08-13 08:21:16,308 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-13 08:21:23,796 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4400, loss[loss=0.1269, beats_loss=0.01082, ecapa_loss=0.0001815, whisper_loss=0.1143, over 21984.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01094, ecapa_loss=0.0001662, whisper_loss=0.09066, over 3856112.66 frames. ], batch size: 87, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:21:28,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2072930.0, ans=0.125 2024-08-13 08:21:29,520 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 08:21:41,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2073030.0, ans=0.0 2024-08-13 08:21:42,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2073030.0, ans=0.125 2024-08-13 08:22:09,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2073130.0, ans=0.125 2024-08-13 08:22:23,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2073230.0, ans=0.05 2024-08-13 08:22:40,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.445e+01 2.799e+01 3.126e+01 5.864e+01, threshold=5.599e+01, percent-clipped=1.0 2024-08-13 08:22:47,166 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4450, loss[loss=0.1024, beats_loss=0.0115, ecapa_loss=0.0001485, whisper_loss=0.08944, over 19770.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01086, ecapa_loss=0.0001665, whisper_loss=0.09091, over 3846371.91 frames. ], batch size: 77, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:22:56,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2073430.0, ans=0.125 2024-08-13 08:23:13,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=2073530.0, ans=0.2 2024-08-13 08:23:43,226 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 08:23:49,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2073730.0, ans=0.1 2024-08-13 08:23:58,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2073830.0, ans=0.2 2024-08-13 08:24:07,752 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4500, loss[loss=0.1108, beats_loss=0.0106, ecapa_loss=0.0001822, whisper_loss=0.09835, over 22042.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001661, whisper_loss=0.09119, over 3901258.28 frames. ], batch size: 89, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:24:20,377 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 08:24:34,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2074030.0, ans=0.125 2024-08-13 08:24:45,254 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 08:24:46,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.48 vs. limit=12.0 2024-08-13 08:24:47,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2074130.0, ans=0.1 2024-08-13 08:24:48,119 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 08:24:58,304 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=15.0 2024-08-13 08:25:02,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2074230.0, ans=0.125 2024-08-13 08:25:11,979 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 08:25:15,979 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.370e+01 2.670e+01 3.024e+01 4.135e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-13 08:25:23,347 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4550, loss[loss=0.1018, beats_loss=0.009904, ecapa_loss=0.000202, whisper_loss=0.08992, over 22894.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01086, ecapa_loss=0.0001681, whisper_loss=0.09102, over 3899546.06 frames. ], batch size: 93, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:25:27,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2074430.0, ans=0.0 2024-08-13 08:25:28,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2074430.0, ans=0.125 2024-08-13 08:25:39,012 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-13 08:25:40,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2074530.0, ans=0.125 2024-08-13 08:25:48,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2074530.0, ans=0.125 2024-08-13 08:26:01,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2074630.0, ans=0.0 2024-08-13 08:26:01,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2074630.0, ans=0.125 2024-08-13 08:26:06,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2074730.0, ans=0.2 2024-08-13 08:26:34,245 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4600, loss[loss=0.09818, beats_loss=0.01175, ecapa_loss=0.0001537, whisper_loss=0.08489, over 22527.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001668, whisper_loss=0.09132, over 3893801.17 frames. ], batch size: 89, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:26:41,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=15.0 2024-08-13 08:26:52,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2024-08-13 08:26:55,306 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 20 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-13 08:26:59,648 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 08:27:24,134 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 08:27:41,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2075330.0, ans=0.2 2024-08-13 08:27:42,333 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.369e+01 2.617e+01 2.923e+01 4.349e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-13 08:27:49,122 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4650, loss[loss=0.09153, beats_loss=0.01369, ecapa_loss=0.0001606, whisper_loss=0.07623, over 21281.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01088, ecapa_loss=0.0001675, whisper_loss=0.09096, over 3878170.66 frames. ], batch size: 88, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:28:12,022 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 30 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 08:28:23,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2075630.0, ans=0.1 2024-08-13 08:28:23,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2075630.0, ans=0.125 2024-08-13 08:28:39,709 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 08:28:48,876 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 17 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 08:28:55,371 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 08:28:58,347 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.59 vs. limit=22.5 2024-08-13 08:29:00,673 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 08:29:00,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2075830.0, ans=0.125 2024-08-13 08:29:04,528 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4700, loss[loss=0.1083, beats_loss=0.01169, ecapa_loss=0.000199, whisper_loss=0.09458, over 21956.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01085, ecapa_loss=0.0001685, whisper_loss=0.09132, over 3875493.67 frames. ], batch size: 92, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:29:11,781 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 08:29:36,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-13 08:29:41,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2024-08-13 08:29:43,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2076130.0, ans=0.125 2024-08-13 08:29:51,845 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 08:29:53,092 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 08:30:01,644 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 08:30:02,064 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2024-08-13 08:30:04,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2076330.0, ans=0.0 2024-08-13 08:30:10,155 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 08:30:13,092 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.493e+01 2.764e+01 3.080e+01 1.960e+02, threshold=5.528e+01, percent-clipped=2.0 2024-08-13 08:30:20,479 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4750, loss[loss=0.08666, beats_loss=0.01156, ecapa_loss=0.0001779, whisper_loss=0.07332, over 22047.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01088, ecapa_loss=0.0001687, whisper_loss=0.09069, over 3888195.45 frames. ], batch size: 90, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:30:26,886 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 08:30:52,339 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-08-13 08:31:09,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2076730.0, ans=0.125 2024-08-13 08:31:20,629 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 08:31:21,112 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2024-08-13 08:31:27,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2076830.0, ans=0.0 2024-08-13 08:31:34,493 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4800, loss[loss=0.1015, beats_loss=0.00921, ecapa_loss=0.0001887, whisper_loss=0.0904, over 20223.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01088, ecapa_loss=0.0001694, whisper_loss=0.09106, over 3899003.19 frames. ], batch size: 83, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:31:45,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2076930.0, ans=0.0 2024-08-13 08:31:50,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2077030.0, ans=0.0 2024-08-13 08:31:55,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2077030.0, ans=0.125 2024-08-13 08:32:42,232 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.414e+01 2.705e+01 2.995e+01 6.816e+01, threshold=5.410e+01, percent-clipped=1.0 2024-08-13 08:32:46,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2077330.0, ans=0.1 2024-08-13 08:32:46,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2077330.0, ans=0.0 2024-08-13 08:32:49,604 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4850, loss[loss=0.124, beats_loss=0.01114, ecapa_loss=0.0001639, whisper_loss=0.1113, over 17388.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01093, ecapa_loss=0.0001684, whisper_loss=0.09087, over 3868238.92 frames. ], batch size: 68, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:32:57,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2077430.0, ans=0.0 2024-08-13 08:33:20,886 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=22.5 2024-08-13 08:33:23,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2077630.0, ans=0.1 2024-08-13 08:33:53,341 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 08:34:02,192 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4900, loss[loss=0.1081, beats_loss=0.01003, ecapa_loss=0.0001534, whisper_loss=0.09654, over 20618.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01093, ecapa_loss=0.0001668, whisper_loss=0.0907, over 3871384.57 frames. ], batch size: 77, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:34:27,802 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 08:34:30,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2078130.0, ans=0.125 2024-08-13 08:34:33,417 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 23 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-13 08:34:36,260 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 26 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-13 08:34:50,459 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 08:35:06,078 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.464e+01 2.767e+01 3.041e+01 1.306e+02, threshold=5.534e+01, percent-clipped=2.0 2024-08-13 08:35:11,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2078430.0, ans=0.125 2024-08-13 08:35:12,508 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 4950, loss[loss=0.111, beats_loss=0.01016, ecapa_loss=0.0001473, whisper_loss=0.09936, over 14259.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01088, ecapa_loss=0.0001681, whisper_loss=0.09112, over 3870649.86 frames. ], batch size: 55, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:35:44,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2078630.0, ans=0.07 2024-08-13 08:36:06,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2078730.0, ans=0.125 2024-08-13 08:36:06,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2078730.0, ans=0.0 2024-08-13 08:36:06,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2078730.0, ans=0.2 2024-08-13 08:36:22,758 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5000, loss[loss=0.0906, beats_loss=0.01138, ecapa_loss=0.0001538, whisper_loss=0.07768, over 15178.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.0001677, whisper_loss=0.09112, over 3859579.44 frames. ], batch size: 58, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:36:26,996 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 08:36:50,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2079130.0, ans=0.0 2024-08-13 08:36:57,999 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 13 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-13 08:36:59,905 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.67 vs. limit=22.5 2024-08-13 08:37:15,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.22 vs. limit=10.0 2024-08-13 08:37:20,423 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.738e-01 2024-08-13 08:37:22,699 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 08:37:24,105 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.436e+01 2.730e+01 3.076e+01 4.220e+01, threshold=5.460e+01, percent-clipped=0.0 2024-08-13 08:37:24,304 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 08:37:30,798 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5050, loss[loss=0.09646, beats_loss=0.01407, ecapa_loss=0.0001357, whisper_loss=0.08103, over 22820.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.000166, whisper_loss=0.09163, over 3869822.13 frames. ], batch size: 93, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:37:35,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2079430.0, ans=0.0 2024-08-13 08:37:36,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2079430.0, ans=0.1 2024-08-13 08:37:43,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2079530.0, ans=0.0 2024-08-13 08:37:54,946 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 31 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 08:37:55,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2079530.0, ans=0.1 2024-08-13 08:38:13,874 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-13 08:38:19,692 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2024-08-13 08:38:20,409 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 19 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 08:38:21,893 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 08:38:27,954 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2024-08-13 08:38:30,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2079830.0, ans=0.0 2024-08-13 08:38:32,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2079830.0, ans=0.125 2024-08-13 08:38:37,790 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5100, loss[loss=0.06644, beats_loss=0.01686, ecapa_loss=0.0001371, whisper_loss=0.04821, over 16148.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01089, ecapa_loss=0.0001659, whisper_loss=0.09144, over 3875833.37 frames. ], batch size: 65, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:38:38,004 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 42 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 08:38:50,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2079930.0, ans=0.125 2024-08-13 08:39:09,462 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-13 08:39:13,365 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 08:39:17,414 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 08:39:25,532 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 08:39:31,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2080230.0, ans=0.125 2024-08-13 08:39:39,201 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 25 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 08:39:40,771 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 31 from Vox, 22 fro AS 2024-08-13 08:39:41,782 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.313e+01 2.678e+01 2.870e+01 5.220e+01, threshold=5.355e+01, percent-clipped=0.0 2024-08-13 08:39:43,303 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 28 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 08:39:48,476 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5150, loss[loss=0.1072, beats_loss=0.007741, ecapa_loss=0.0001733, whisper_loss=0.09773, over 15264.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.0001651, whisper_loss=0.09169, over 3859417.58 frames. ], batch size: 58, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:40:01,052 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-13 08:40:04,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2080530.0, ans=0.125 2024-08-13 08:40:31,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2080730.0, ans=0.0 2024-08-13 08:40:45,176 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 08:40:52,069 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 08:40:53,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2080830.0, ans=0.1 2024-08-13 08:40:57,285 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5200, loss[loss=0.1221, beats_loss=0.01182, ecapa_loss=0.0001462, whisper_loss=0.1088, over 23079.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01086, ecapa_loss=0.0001658, whisper_loss=0.09173, over 3882011.70 frames. ], batch size: 90, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:41:02,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2080930.0, ans=0.0 2024-08-13 08:41:05,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2080930.0, ans=0.125 2024-08-13 08:41:26,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2081130.0, ans=0.0 2024-08-13 08:41:26,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2081130.0, ans=0.125 2024-08-13 08:41:27,541 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 20 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-13 08:41:27,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2081130.0, ans=0.0 2024-08-13 08:41:56,346 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 08:41:58,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2081330.0, ans=0.1 2024-08-13 08:41:59,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.338e+01 2.575e+01 2.873e+01 5.976e+01, threshold=5.150e+01, percent-clipped=1.0 2024-08-13 08:42:06,562 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5250, loss[loss=0.09437, beats_loss=0.01076, ecapa_loss=0.0001937, whisper_loss=0.08168, over 16034.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01079, ecapa_loss=0.0001656, whisper_loss=0.09178, over 3830192.69 frames. ], batch size: 67, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:42:07,938 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 08:42:15,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2081430.0, ans=0.2 2024-08-13 08:42:25,749 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 08:42:26,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2081530.0, ans=0.125 2024-08-13 08:42:35,595 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-13 08:42:40,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2024-08-13 08:42:44,274 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 18 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 08:42:52,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2081730.0, ans=0.0 2024-08-13 08:43:10,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2081830.0, ans=0.2 2024-08-13 08:43:14,622 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5300, loss[loss=0.1106, beats_loss=0.01016, ecapa_loss=0.0001483, whisper_loss=0.09897, over 23550.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01081, ecapa_loss=0.0001658, whisper_loss=0.09136, over 3847251.39 frames. ], batch size: 88, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:43:36,954 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 08:43:45,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2082130.0, ans=0.125 2024-08-13 08:43:49,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2082130.0, ans=0.2 2024-08-13 08:43:52,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2082130.0, ans=0.1 2024-08-13 08:43:55,530 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-08-13 08:43:56,421 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 20 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-13 08:44:03,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2082230.0, ans=0.0 2024-08-13 08:44:14,505 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-13 08:44:17,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.493e+01 2.716e+01 3.005e+01 4.281e+01, threshold=5.431e+01, percent-clipped=0.0 2024-08-13 08:44:23,010 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 08:44:24,057 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5350, loss[loss=0.1034, beats_loss=0.01283, ecapa_loss=0.0001342, whisper_loss=0.0892, over 23137.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01076, ecapa_loss=0.0001652, whisper_loss=0.09207, over 3839965.38 frames. ], batch size: 90, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:44:28,476 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 30 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-13 08:45:28,290 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-13 08:45:32,507 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5400, loss[loss=0.09166, beats_loss=0.01409, ecapa_loss=0.0001606, whisper_loss=0.07597, over 20953.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0108, ecapa_loss=0.0001652, whisper_loss=0.09125, over 3853313.38 frames. ], batch size: 90, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:45:37,626 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.09 vs. limit=15.0 2024-08-13 08:46:09,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2083130.0, ans=0.2 2024-08-13 08:46:21,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2083230.0, ans=0.2 2024-08-13 08:46:34,063 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.479e+01 2.684e+01 3.109e+01 1.549e+02, threshold=5.369e+01, percent-clipped=2.0 2024-08-13 08:46:35,646 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 08:46:35,793 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 08:46:40,902 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5450, loss[loss=0.1098, beats_loss=0.01013, ecapa_loss=0.0002004, whisper_loss=0.0977, over 22006.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0108, ecapa_loss=0.000165, whisper_loss=0.09135, over 3881329.85 frames. ], batch size: 89, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:46:51,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2083430.0, ans=0.125 2024-08-13 08:46:59,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2083530.0, ans=0.125 2024-08-13 08:47:01,211 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 37 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 08:47:18,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.85 vs. limit=15.0 2024-08-13 08:47:42,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2083730.0, ans=0.0 2024-08-13 08:47:42,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2083730.0, ans=0.125 2024-08-13 08:47:43,909 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-13 08:47:56,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2083830.0, ans=0.04949747468305833 2024-08-13 08:47:59,011 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5500, loss[loss=0.0927, beats_loss=0.01265, ecapa_loss=0.0001388, whisper_loss=0.07866, over 22723.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01086, ecapa_loss=0.000165, whisper_loss=0.0913, over 3916945.10 frames. ], batch size: 93, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:48:00,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2083930.0, ans=0.0 2024-08-13 08:48:04,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2083930.0, ans=0.0 2024-08-13 08:48:57,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2084230.0, ans=0.125 2024-08-13 08:48:59,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2084230.0, ans=0.1 2024-08-13 08:49:02,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2084230.0, ans=0.125 2024-08-13 08:49:04,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2084230.0, ans=0.0 2024-08-13 08:49:04,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2084230.0, ans=0.125 2024-08-13 08:49:16,873 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.483e+01 2.752e+01 3.066e+01 5.816e+01, threshold=5.504e+01, percent-clipped=1.0 2024-08-13 08:49:26,429 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5550, loss[loss=0.1033, beats_loss=0.01224, ecapa_loss=0.0001212, whisper_loss=0.08986, over 24171.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01091, ecapa_loss=0.0001632, whisper_loss=0.09143, over 3934434.94 frames. ], batch size: 92, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:49:47,800 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 08:49:53,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2084530.0, ans=0.125 2024-08-13 08:49:59,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2084530.0, ans=0.125 2024-08-13 08:50:17,678 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2024-08-13 08:50:27,362 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 08:50:59,712 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5600, loss[loss=0.1049, beats_loss=0.009155, ecapa_loss=0.0001817, whisper_loss=0.09397, over 23216.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01089, ecapa_loss=0.0001647, whisper_loss=0.09104, over 3910823.61 frames. ], batch size: 93, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:51:14,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2084930.0, ans=0.0 2024-08-13 08:51:25,575 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 08:51:27,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2085030.0, ans=0.0 2024-08-13 08:51:51,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2085130.0, ans=0.1 2024-08-13 08:51:59,775 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 08:52:04,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2085230.0, ans=0.125 2024-08-13 08:52:33,300 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.389e+01 2.717e+01 3.076e+01 5.909e+01, threshold=5.434e+01, percent-clipped=1.0 2024-08-13 08:52:43,375 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5650, loss[loss=0.1183, beats_loss=0.01101, ecapa_loss=0.0001523, whisper_loss=0.1058, over 22273.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.0001648, whisper_loss=0.09116, over 3916130.98 frames. ], batch size: 88, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:53:11,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2085530.0, ans=0.125 2024-08-13 08:53:25,254 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 08:53:27,026 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 08:53:42,088 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2024-08-13 08:54:11,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2085830.0, ans=0.125 2024-08-13 08:54:14,706 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-13 08:54:17,619 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5700, loss[loss=0.1165, beats_loss=0.009274, ecapa_loss=0.0001493, whisper_loss=0.1058, over 22195.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01091, ecapa_loss=0.0001662, whisper_loss=0.09074, over 3933938.28 frames. ], batch size: 83, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:54:20,667 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 08:54:25,902 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 18 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 08:55:01,023 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.772e+01 2024-08-13 08:55:07,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2086230.0, ans=0.2 2024-08-13 08:55:13,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2086230.0, ans=0.125 2024-08-13 08:55:22,369 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-13 08:55:25,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.405e+01 2.655e+01 3.007e+01 4.478e+01, threshold=5.310e+01, percent-clipped=0.0 2024-08-13 08:55:33,836 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5750, loss[loss=0.08685, beats_loss=0.01229, ecapa_loss=0.0001872, whisper_loss=0.07269, over 17056.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01082, ecapa_loss=0.0001684, whisper_loss=0.09146, over 3937546.33 frames. ], batch size: 71, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:56:00,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2086530.0, ans=0.125 2024-08-13 08:56:00,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2086530.0, ans=15.0 2024-08-13 08:56:02,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2086530.0, ans=0.0 2024-08-13 08:56:10,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2086630.0, ans=0.2 2024-08-13 08:56:11,183 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 08:56:20,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2086730.0, ans=0.125 2024-08-13 08:56:23,259 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 08:56:42,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2086830.0, ans=0.07 2024-08-13 08:56:51,837 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5800, loss[loss=0.06736, beats_loss=0.01569, ecapa_loss=0.0001675, whisper_loss=0.05, over 16277.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01091, ecapa_loss=0.0001672, whisper_loss=0.0909, over 3915198.45 frames. ], batch size: 69, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:56:58,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2086930.0, ans=0.5 2024-08-13 08:57:03,144 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 08:57:34,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2087130.0, ans=0.125 2024-08-13 08:57:41,949 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 32 from Vox, 22 fro AS 2024-08-13 08:57:45,423 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 08:57:59,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2087330.0, ans=0.1 2024-08-13 08:58:02,109 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.414e+01 2.686e+01 3.038e+01 9.495e+01, threshold=5.372e+01, percent-clipped=3.0 2024-08-13 08:58:09,836 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5850, loss[loss=0.09325, beats_loss=0.01044, ecapa_loss=0.000189, whisper_loss=0.08092, over 17520.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01086, ecapa_loss=0.0001682, whisper_loss=0.09108, over 3875181.92 frames. ], batch size: 71, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:58:37,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2087530.0, ans=0.025 2024-08-13 08:58:48,881 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-13 08:58:50,211 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 08:58:56,487 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 15 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 08:58:56,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2087730.0, ans=0.125 2024-08-13 08:58:58,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2087730.0, ans=0.04949747468305833 2024-08-13 08:59:03,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2087730.0, ans=0.0 2024-08-13 08:59:05,095 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-13 08:59:19,947 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 08:59:22,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2087830.0, ans=0.125 2024-08-13 08:59:28,481 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5900, loss[loss=0.07448, beats_loss=0.01331, ecapa_loss=0.000143, whisper_loss=0.05975, over 14700.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01085, ecapa_loss=0.0001676, whisper_loss=0.09116, over 3861429.37 frames. ], batch size: 58, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:59:57,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=2088030.0, ans=0.02 2024-08-13 08:59:57,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2088030.0, ans=0.125 2024-08-13 09:00:16,381 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-13 09:00:39,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.419e+01 2.634e+01 3.004e+01 5.084e+01, threshold=5.268e+01, percent-clipped=0.0 2024-08-13 09:00:47,066 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 5950, loss[loss=0.1081, beats_loss=0.01133, ecapa_loss=0.0001467, whisper_loss=0.09533, over 23359.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01091, ecapa_loss=0.0001673, whisper_loss=0.0906, over 3875391.89 frames. ], batch size: 90, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:00:49,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2088430.0, ans=0.0 2024-08-13 09:01:05,538 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 09:01:05,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2088530.0, ans=0.05 2024-08-13 09:01:10,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2088530.0, ans=0.0 2024-08-13 09:01:24,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2088630.0, ans=0.1 2024-08-13 09:01:38,531 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2024-08-13 09:01:47,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2088730.0, ans=0.125 2024-08-13 09:01:48,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2088730.0, ans=0.125 2024-08-13 09:01:48,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2088730.0, ans=0.0 2024-08-13 09:01:56,729 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 26 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 09:01:57,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2088830.0, ans=0.125 2024-08-13 09:02:00,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2088830.0, ans=0.125 2024-08-13 09:02:07,005 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6000, loss[loss=0.1139, beats_loss=0.009658, ecapa_loss=0.000212, whisper_loss=0.1021, over 23185.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.000168, whisper_loss=0.09138, over 3892092.58 frames. ], batch size: 94, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:02:07,005 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 09:02:46,576 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on ASR_libri: loss=0.2545, beats_loss=0, ecapa_loss=0.0005583, whisper_loss=0.2489, over 922467.00 frames. 2024-08-13 09:03:03,899 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on SV_voxceleb1: loss=0.004508, beats_loss=0, ecapa_loss=0.0004508, whisper_loss=0, over 939242.00 frames. 2024-08-13 09:05:03,031 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on AT_audioset: loss=0.02381, beats_loss=0.02381, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 09:05:03,034 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-13 09:05:27,575 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 09:05:33,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2089130.0, ans=0.125 2024-08-13 09:05:35,291 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2024-08-13 09:05:58,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2089230.0, ans=0.07 2024-08-13 09:05:59,913 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 09:06:08,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2089330.0, ans=0.025 2024-08-13 09:06:08,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2089330.0, ans=0.125 2024-08-13 09:06:12,159 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.429e+01 2.733e+01 3.006e+01 6.424e+01, threshold=5.466e+01, percent-clipped=1.0 2024-08-13 09:06:12,906 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=12.0 2024-08-13 09:06:19,847 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6050, loss[loss=0.08396, beats_loss=0.01242, ecapa_loss=0.0001621, whisper_loss=0.06992, over 15589.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.0001668, whisper_loss=0.09129, over 3891061.72 frames. ], batch size: 63, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:06:39,427 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2024-08-13 09:06:53,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2089630.0, ans=0.1 2024-08-13 09:07:03,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2089630.0, ans=0.2 2024-08-13 09:07:38,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2089830.0, ans=0.1 2024-08-13 09:07:41,915 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6100, loss[loss=0.111, beats_loss=0.008358, ecapa_loss=0.0002016, whisper_loss=0.1006, over 17302.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.0001673, whisper_loss=0.09197, over 3884743.31 frames. ], batch size: 68, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:07:42,161 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 19 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-13 09:07:42,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2089930.0, ans=0.125 2024-08-13 09:08:13,839 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 09:08:18,694 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 09:08:29,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2090130.0, ans=0.0 2024-08-13 09:08:37,700 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 09:08:39,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2090230.0, ans=0.125 2024-08-13 09:08:50,552 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.16 vs. limit=15.0 2024-08-13 09:08:54,468 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 09:08:55,608 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.306e+01 2.537e+01 2.839e+01 1.271e+02, threshold=5.074e+01, percent-clipped=1.0 2024-08-13 09:08:55,917 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 09:08:56,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2090330.0, ans=0.1 2024-08-13 09:09:03,094 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6150, loss[loss=0.1034, beats_loss=0.01028, ecapa_loss=0.0001773, whisper_loss=0.09138, over 14579.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01079, ecapa_loss=0.0001664, whisper_loss=0.09178, over 3887121.18 frames. ], batch size: 57, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:09:04,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2090430.0, ans=0.125 2024-08-13 09:09:15,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2090430.0, ans=0.125 2024-08-13 09:09:36,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2090630.0, ans=0.125 2024-08-13 09:09:50,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2090730.0, ans=0.125 2024-08-13 09:10:01,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2090730.0, ans=0.125 2024-08-13 09:10:01,554 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.49 vs. limit=10.0 2024-08-13 09:10:07,555 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 09:10:17,381 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 09:10:21,808 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 37 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-13 09:10:23,361 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6200, loss[loss=0.1304, beats_loss=0.01087, ecapa_loss=0.0001419, whisper_loss=0.1181, over 23379.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01083, ecapa_loss=0.0001658, whisper_loss=0.09163, over 3905802.78 frames. ], batch size: 90, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:10:33,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2090930.0, ans=0.1 2024-08-13 09:10:33,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2090930.0, ans=0.1 2024-08-13 09:10:45,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2091030.0, ans=0.0 2024-08-13 09:10:51,216 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 09:10:59,622 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-13 09:11:20,265 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 09:11:27,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2091230.0, ans=0.125 2024-08-13 09:11:37,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.446e+01 2.761e+01 3.049e+01 5.001e+01, threshold=5.523e+01, percent-clipped=0.0 2024-08-13 09:11:45,298 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6250, loss[loss=0.1058, beats_loss=0.01133, ecapa_loss=0.0001502, whisper_loss=0.09297, over 17437.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01083, ecapa_loss=0.0001642, whisper_loss=0.09154, over 3935659.50 frames. ], batch size: 71, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:12:01,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2091530.0, ans=0.125 2024-08-13 09:12:06,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2091530.0, ans=0.1 2024-08-13 09:13:01,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2091830.0, ans=0.0 2024-08-13 09:13:02,641 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-13 09:13:05,969 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6300, loss[loss=0.1118, beats_loss=0.009976, ecapa_loss=0.0001917, whisper_loss=0.09989, over 14687.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001641, whisper_loss=0.09143, over 3909151.80 frames. ], batch size: 57, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:13:11,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2091930.0, ans=0.09899494936611666 2024-08-13 09:13:23,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2092030.0, ans=0.0 2024-08-13 09:13:28,909 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 09:13:37,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2092130.0, ans=0.125 2024-08-13 09:13:40,753 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-13 09:13:52,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2092230.0, ans=0.125 2024-08-13 09:13:54,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2092230.0, ans=0.125 2024-08-13 09:14:00,102 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 10 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 09:14:16,356 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.468e+01 2.785e+01 3.208e+01 1.167e+02, threshold=5.571e+01, percent-clipped=1.0 2024-08-13 09:14:17,065 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.77 vs. limit=15.0 2024-08-13 09:14:23,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2092430.0, ans=0.125 2024-08-13 09:14:23,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2092430.0, ans=0.125 2024-08-13 09:14:24,605 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6350, loss[loss=0.1122, beats_loss=0.00958, ecapa_loss=0.0001944, whisper_loss=0.1007, over 14046.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001658, whisper_loss=0.09099, over 3901233.18 frames. ], batch size: 55, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:14:28,970 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 09:14:41,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2092530.0, ans=0.125 2024-08-13 09:14:42,721 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.75 vs. limit=22.5 2024-08-13 09:15:06,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2092630.0, ans=0.1 2024-08-13 09:15:30,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2092830.0, ans=0.0 2024-08-13 09:15:35,532 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6400, loss[loss=0.1069, beats_loss=0.0123, ecapa_loss=0.0001638, whisper_loss=0.093, over 22159.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01086, ecapa_loss=0.0001659, whisper_loss=0.0916, over 3928240.47 frames. ], batch size: 92, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:15:45,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2092930.0, ans=0.125 2024-08-13 09:15:48,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2093030.0, ans=0.125 2024-08-13 09:15:51,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2093030.0, ans=0.0 2024-08-13 09:15:58,887 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-13 09:16:03,277 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.89 vs. limit=22.5 2024-08-13 09:16:04,050 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 09:16:10,741 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 09:16:16,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2093230.0, ans=0.125 2024-08-13 09:16:21,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2093230.0, ans=0.125 2024-08-13 09:16:25,086 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 39 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 09:16:34,589 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.483e+01 2.753e+01 3.245e+01 5.103e+01, threshold=5.505e+01, percent-clipped=0.0 2024-08-13 09:16:35,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2093330.0, ans=0.0 2024-08-13 09:16:41,184 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6450, loss[loss=0.1034, beats_loss=0.01012, ecapa_loss=0.000185, whisper_loss=0.09139, over 19700.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01088, ecapa_loss=0.0001666, whisper_loss=0.09142, over 3944692.19 frames. ], batch size: 81, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:16:45,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2093430.0, ans=0.0 2024-08-13 09:16:48,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2093430.0, ans=0.125 2024-08-13 09:17:08,349 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.73 vs. limit=22.5 2024-08-13 09:17:16,724 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 09:17:19,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2093730.0, ans=0.125 2024-08-13 09:17:21,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2093730.0, ans=0.2 2024-08-13 09:17:33,618 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-08-13 09:17:42,254 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=15.0 2024-08-13 09:17:46,596 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6500, loss[loss=0.1066, beats_loss=0.007202, ecapa_loss=0.0001872, whisper_loss=0.09753, over 16901.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01088, ecapa_loss=0.0001671, whisper_loss=0.09179, over 3933805.82 frames. ], batch size: 66, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:17:52,175 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 17 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 09:18:03,432 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2024-08-13 09:18:08,686 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2024-08-13 09:18:10,510 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 28 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 09:18:26,552 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 09:18:46,270 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.544e+01 2.898e+01 3.309e+01 5.602e+01, threshold=5.795e+01, percent-clipped=1.0 2024-08-13 09:18:49,357 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2024-08-13 09:18:52,768 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6550, loss[loss=0.115, beats_loss=0.01069, ecapa_loss=0.0001663, whisper_loss=0.1026, over 20407.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01086, ecapa_loss=0.0001675, whisper_loss=0.0924, over 3951019.10 frames. ], batch size: 79, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:19:10,241 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 40 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 09:19:14,234 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 15 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 09:19:19,867 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2024-08-13 09:19:41,170 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 33 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 09:19:41,812 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=22.5 2024-08-13 09:19:54,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2094830.0, ans=0.1 2024-08-13 09:19:54,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2094830.0, ans=0.125 2024-08-13 09:19:57,574 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6600, loss[loss=0.1128, beats_loss=0.009438, ecapa_loss=0.0002052, whisper_loss=0.1013, over 20941.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01087, ecapa_loss=0.0001671, whisper_loss=0.09262, over 3964340.94 frames. ], batch size: 87, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:19:58,997 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 34 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-13 09:19:59,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2094930.0, ans=0.2 2024-08-13 09:20:10,990 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-13 09:20:32,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2095130.0, ans=0.0 2024-08-13 09:20:43,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2095230.0, ans=0.125 2024-08-13 09:20:47,881 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 12 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 09:20:48,430 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-13 09:20:49,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2095330.0, ans=0.2 2024-08-13 09:20:56,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.422e+01 2.623e+01 3.004e+01 7.541e+01, threshold=5.247e+01, percent-clipped=2.0 2024-08-13 09:20:57,555 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-08-13 09:20:58,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2095330.0, ans=0.125 2024-08-13 09:21:03,437 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6650, loss[loss=0.08941, beats_loss=0.01213, ecapa_loss=0.0002168, whisper_loss=0.07512, over 19955.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01093, ecapa_loss=0.0001673, whisper_loss=0.09103, over 3932460.55 frames. ], batch size: 88, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:21:06,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2095430.0, ans=0.2 2024-08-13 09:21:16,959 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 09:21:29,295 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-08-13 09:21:36,354 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 12 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 09:21:54,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2095730.0, ans=0.0 2024-08-13 09:21:59,112 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.355e-02 2024-08-13 09:22:01,709 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 09:22:09,186 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6700, loss[loss=0.1119, beats_loss=0.01034, ecapa_loss=0.0001598, whisper_loss=0.09999, over 13827.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01091, ecapa_loss=0.000167, whisper_loss=0.0919, over 3954804.79 frames. ], batch size: 54, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:22:12,730 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2024-08-13 09:22:15,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2095930.0, ans=0.0 2024-08-13 09:22:25,432 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 09:22:50,072 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 09:22:56,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2096230.0, ans=0.125 2024-08-13 09:22:58,382 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-13 09:23:05,586 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 09:23:07,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.450e+01 2.665e+01 3.008e+01 5.668e+01, threshold=5.331e+01, percent-clipped=2.0 2024-08-13 09:23:14,787 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6750, loss[loss=0.08763, beats_loss=0.01127, ecapa_loss=0.0001316, whisper_loss=0.07504, over 17176.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01085, ecapa_loss=0.0001668, whisper_loss=0.09251, over 3951427.80 frames. ], batch size: 67, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:23:16,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2096430.0, ans=0.2 2024-08-13 09:23:21,653 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 19 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 09:24:20,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.04 vs. limit=15.0 2024-08-13 09:24:20,353 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6800, loss[loss=0.1037, beats_loss=0.01022, ecapa_loss=0.0001526, whisper_loss=0.09198, over 21312.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01089, ecapa_loss=0.0001664, whisper_loss=0.09212, over 3938720.57 frames. ], batch size: 83, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:24:24,693 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.638e+01 2024-08-13 09:25:07,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2097230.0, ans=0.2 2024-08-13 09:25:10,068 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 09:25:13,542 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.83 vs. limit=12.0 2024-08-13 09:25:20,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.429e+01 2.619e+01 3.014e+01 5.255e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-13 09:25:26,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2097430.0, ans=0.2 2024-08-13 09:25:27,732 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6850, loss[loss=0.102, beats_loss=0.01011, ecapa_loss=0.0001587, whisper_loss=0.0903, over 16000.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01089, ecapa_loss=0.0001654, whisper_loss=0.09189, over 3931498.67 frames. ], batch size: 61, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:25:29,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2097430.0, ans=0.125 2024-08-13 09:25:49,976 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 09:26:00,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2097630.0, ans=10.0 2024-08-13 09:26:11,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2097730.0, ans=0.125 2024-08-13 09:26:31,734 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 09:26:33,018 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6900, loss[loss=0.1063, beats_loss=0.01094, ecapa_loss=0.000161, whisper_loss=0.09378, over 21810.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01098, ecapa_loss=0.0001641, whisper_loss=0.09163, over 3934847.76 frames. ], batch size: 84, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:26:34,974 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.03 vs. limit=10.0 2024-08-13 09:27:18,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2098230.0, ans=0.125 2024-08-13 09:27:24,565 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.64 vs. limit=6.0 2024-08-13 09:27:29,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.11 vs. limit=22.5 2024-08-13 09:27:32,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.455e+01 2.903e+01 3.270e+01 5.847e+01, threshold=5.807e+01, percent-clipped=1.0 2024-08-13 09:27:39,162 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 6950, loss[loss=0.0648, beats_loss=0.01399, ecapa_loss=8.334e-05, whisper_loss=0.04998, over 15365.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.0001641, whisper_loss=0.09207, over 3955190.03 frames. ], batch size: 58, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:27:48,754 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 09:27:50,013 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-13 09:27:50,675 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2024-08-13 09:27:50,751 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2024-08-13 09:27:56,992 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.97 vs. limit=22.5 2024-08-13 09:28:00,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2098530.0, ans=0.125 2024-08-13 09:28:33,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2098830.0, ans=0.025 2024-08-13 09:28:35,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2098830.0, ans=0.0 2024-08-13 09:28:36,962 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 09:28:44,327 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7000, loss[loss=0.122, beats_loss=0.009166, ecapa_loss=0.0001637, whisper_loss=0.1112, over 23496.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01099, ecapa_loss=0.0001644, whisper_loss=0.09138, over 3940832.57 frames. ], batch size: 92, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:28:47,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2098930.0, ans=0.0 2024-08-13 09:28:49,945 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 09:28:55,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2098930.0, ans=0.125 2024-08-13 09:29:04,568 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 09:29:17,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2099130.0, ans=15.0 2024-08-13 09:29:20,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=2099130.0, ans=0.02 2024-08-13 09:29:23,495 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 21 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-13 09:29:42,649 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.399e+01 2.678e+01 3.214e+01 5.831e+01, threshold=5.356e+01, percent-clipped=1.0 2024-08-13 09:29:43,758 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=15.0 2024-08-13 09:29:45,453 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-13 09:29:49,622 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7050, loss[loss=0.08239, beats_loss=0.01348, ecapa_loss=0.0001518, whisper_loss=0.06739, over 20660.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01101, ecapa_loss=0.0001648, whisper_loss=0.09092, over 3927519.41 frames. ], batch size: 86, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:30:18,774 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 09:30:23,876 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 09:30:56,396 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 26 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 09:31:00,375 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7100, loss[loss=0.09975, beats_loss=0.01165, ecapa_loss=0.0002056, whisper_loss=0.08604, over 18420.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01092, ecapa_loss=0.0001655, whisper_loss=0.09147, over 3920765.88 frames. ], batch size: 79, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:31:03,844 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 27 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 09:31:32,953 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 21 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-13 09:31:41,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-08-13 09:31:42,296 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-13 09:32:01,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2100330.0, ans=0.2 2024-08-13 09:32:03,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2100330.0, ans=0.0 2024-08-13 09:32:08,808 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.488e+01 2.756e+01 3.074e+01 1.860e+02, threshold=5.512e+01, percent-clipped=2.0 2024-08-13 09:32:08,994 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 09:32:14,902 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7150, loss[loss=0.09325, beats_loss=0.01094, ecapa_loss=0.0001808, whisper_loss=0.0805, over 22145.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001659, whisper_loss=0.09172, over 3923575.09 frames. ], batch size: 91, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:32:35,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2100530.0, ans=0.125 2024-08-13 09:32:37,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2100530.0, ans=0.2 2024-08-13 09:33:08,978 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 09:33:10,820 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 09:33:16,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=2100830.0, ans=12.0 2024-08-13 09:33:20,922 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 31 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 09:33:29,683 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7200, loss[loss=0.08737, beats_loss=0.01312, ecapa_loss=0.0001293, whisper_loss=0.07295, over 16546.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01096, ecapa_loss=0.0001643, whisper_loss=0.09123, over 3908873.81 frames. ], batch size: 66, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:33:47,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2101030.0, ans=0.125 2024-08-13 09:33:59,875 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 09:34:02,451 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 09:34:18,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2101230.0, ans=0.125 2024-08-13 09:34:29,152 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 09:34:38,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.408e+01 2.663e+01 2.960e+01 8.950e+01, threshold=5.327e+01, percent-clipped=1.0 2024-08-13 09:34:44,230 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7250, loss[loss=0.1158, beats_loss=0.01114, ecapa_loss=0.0001068, whisper_loss=0.1036, over 18963.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01099, ecapa_loss=0.0001636, whisper_loss=0.0908, over 3911667.84 frames. ], batch size: 67, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:35:00,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2101530.0, ans=0.125 2024-08-13 09:35:05,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2101530.0, ans=0.125 2024-08-13 09:35:15,953 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 09:35:17,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2101630.0, ans=0.0 2024-08-13 09:35:28,183 INFO [train_multi_KD3.py:844] (1/4) A total of 96 cuts. 20 from LS+wenet, 24 from Vox, 52 fro AS 2024-08-13 09:35:28,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2101730.0, ans=0.0 2024-08-13 09:35:33,378 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.94 vs. limit=15.0 2024-08-13 09:35:41,622 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 09:35:41,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2101730.0, ans=0.1 2024-08-13 09:35:46,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2101830.0, ans=0.0 2024-08-13 09:35:47,596 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 24 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-13 09:35:49,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2101830.0, ans=0.125 2024-08-13 09:35:51,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2101830.0, ans=0.1 2024-08-13 09:35:59,567 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7300, loss[loss=0.132, beats_loss=0.008879, ecapa_loss=0.0002266, whisper_loss=0.1209, over 19605.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01091, ecapa_loss=0.0001648, whisper_loss=0.09158, over 3877859.14 frames. ], batch size: 78, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:36:02,762 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 26 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 09:36:35,948 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 09:37:08,592 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.467e+01 2.644e+01 2.965e+01 8.104e+01, threshold=5.287e+01, percent-clipped=3.0 2024-08-13 09:37:13,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2102430.0, ans=0.125 2024-08-13 09:37:14,138 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7350, loss[loss=0.08873, beats_loss=0.01154, ecapa_loss=0.0001893, whisper_loss=0.07529, over 21370.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.0001656, whisper_loss=0.09113, over 3846403.18 frames. ], batch size: 92, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:37:20,122 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 34 from Vox, 28 fro AS 2024-08-13 09:37:22,321 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.910e+05 2024-08-13 09:37:23,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2102430.0, ans=0.125 2024-08-13 09:37:42,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2102630.0, ans=0.0 2024-08-13 09:38:01,658 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-13 09:38:29,004 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7400, loss[loss=0.1003, beats_loss=0.01034, ecapa_loss=0.0001739, whisper_loss=0.08821, over 22196.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01091, ecapa_loss=0.0001657, whisper_loss=0.09046, over 3865212.89 frames. ], batch size: 94, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:38:50,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2103030.0, ans=0.0 2024-08-13 09:39:01,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2103130.0, ans=0.2 2024-08-13 09:39:04,654 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 09:39:14,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2103230.0, ans=0.0 2024-08-13 09:39:26,514 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 09:39:27,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2103230.0, ans=0.125 2024-08-13 09:39:29,791 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 09:39:30,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2103330.0, ans=0.0 2024-08-13 09:39:36,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2103330.0, ans=0.2 2024-08-13 09:39:40,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.669e+01 2.473e+01 2.699e+01 3.080e+01 4.653e+01, threshold=5.397e+01, percent-clipped=0.0 2024-08-13 09:39:47,334 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7450, loss[loss=0.08923, beats_loss=0.01199, ecapa_loss=0.0002053, whisper_loss=0.07518, over 14972.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001669, whisper_loss=0.09117, over 3882705.30 frames. ], batch size: 63, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:40:06,613 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2024-08-13 09:40:31,748 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 09:40:38,667 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 09:40:46,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2103730.0, ans=0.0 2024-08-13 09:40:49,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2103830.0, ans=0.125 2024-08-13 09:40:54,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2103830.0, ans=0.0 2024-08-13 09:41:03,521 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7500, loss[loss=0.09914, beats_loss=0.009397, ecapa_loss=0.0001342, whisper_loss=0.08841, over 15343.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01087, ecapa_loss=0.0001669, whisper_loss=0.09092, over 3899237.45 frames. ], batch size: 57, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:41:06,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2103930.0, ans=0.125 2024-08-13 09:41:09,608 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 09:41:23,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2104030.0, ans=0.125 2024-08-13 09:42:11,385 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.360e+01 2.624e+01 2.937e+01 1.240e+02, threshold=5.248e+01, percent-clipped=1.0 2024-08-13 09:42:14,522 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 09:42:17,258 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7550, loss[loss=0.08781, beats_loss=0.01113, ecapa_loss=0.0002109, whisper_loss=0.07457, over 16179.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01079, ecapa_loss=0.0001667, whisper_loss=0.09143, over 3882681.80 frames. ], batch size: 68, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:42:26,326 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 14 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 09:42:41,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2104530.0, ans=0.2 2024-08-13 09:42:49,501 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2024-08-13 09:42:52,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2104630.0, ans=0.125 2024-08-13 09:42:54,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2104630.0, ans=0.125 2024-08-13 09:42:55,548 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 28 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-13 09:43:00,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2104630.0, ans=0.0 2024-08-13 09:43:06,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2104730.0, ans=0.0 2024-08-13 09:43:11,872 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 09:43:14,537 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 35 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 09:43:17,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2104830.0, ans=0.1 2024-08-13 09:43:18,876 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 09:43:25,059 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 11 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 09:43:32,205 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7600, loss[loss=0.118, beats_loss=0.01006, ecapa_loss=0.0001705, whisper_loss=0.1062, over 21467.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.0001669, whisper_loss=0.09166, over 3887799.58 frames. ], batch size: 86, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:44:21,679 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 09:44:41,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.611e+01 2.428e+01 2.721e+01 3.053e+01 1.709e+02, threshold=5.443e+01, percent-clipped=2.0 2024-08-13 09:44:46,659 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7650, loss[loss=0.1082, beats_loss=0.00974, ecapa_loss=0.0001565, whisper_loss=0.09689, over 14415.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01081, ecapa_loss=0.0001673, whisper_loss=0.09135, over 3874763.03 frames. ], batch size: 57, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:44:57,691 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2024-08-13 09:45:00,102 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 20 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 09:45:11,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2105530.0, ans=0.1 2024-08-13 09:45:13,371 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-13 09:45:17,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2105630.0, ans=0.125 2024-08-13 09:45:26,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2105630.0, ans=0.125 2024-08-13 09:45:27,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2105630.0, ans=0.2 2024-08-13 09:45:29,115 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 25 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 09:45:29,777 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.87 vs. limit=10.0 2024-08-13 09:45:41,642 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 22 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 09:45:43,203 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-13 09:45:54,835 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=22.5 2024-08-13 09:46:02,633 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7700, loss[loss=0.0803, beats_loss=0.01151, ecapa_loss=0.000153, whisper_loss=0.06726, over 13854.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01079, ecapa_loss=0.0001664, whisper_loss=0.09159, over 3892713.51 frames. ], batch size: 55, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:46:10,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2105930.0, ans=0.2 2024-08-13 09:46:21,850 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 09:46:24,853 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 09:46:29,841 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 09:46:33,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2106130.0, ans=0.0 2024-08-13 09:46:50,675 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.212e-01 2024-08-13 09:46:58,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2106230.0, ans=0.0 2024-08-13 09:47:00,200 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2024-08-13 09:47:05,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2106330.0, ans=0.125 2024-08-13 09:47:12,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.458e+01 2.712e+01 3.112e+01 4.115e+01, threshold=5.423e+01, percent-clipped=0.0 2024-08-13 09:47:18,023 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7750, loss[loss=0.09095, beats_loss=0.01222, ecapa_loss=0.0001595, whisper_loss=0.07713, over 18218.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01074, ecapa_loss=0.0001675, whisper_loss=0.09115, over 3881945.21 frames. ], batch size: 72, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:47:23,718 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-08-13 09:47:31,039 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.94 vs. limit=10.0 2024-08-13 09:47:35,795 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 09:47:38,613 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 09:47:44,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2106530.0, ans=0.0 2024-08-13 09:47:46,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2106530.0, ans=0.125 2024-08-13 09:47:52,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2106630.0, ans=0.125 2024-08-13 09:47:52,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2106630.0, ans=0.125 2024-08-13 09:47:53,060 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 09:47:53,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=2106630.0, ans=0.02 2024-08-13 09:48:08,567 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 09:48:21,212 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2024-08-13 09:48:27,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-13 09:48:29,440 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 25 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-13 09:48:31,888 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.71 vs. limit=22.5 2024-08-13 09:48:35,001 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7800, loss[loss=0.1328, beats_loss=0.009113, ecapa_loss=0.0001749, whisper_loss=0.1219, over 15216.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01077, ecapa_loss=0.0001671, whisper_loss=0.09105, over 3855250.13 frames. ], batch size: 59, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:48:37,971 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 09:49:27,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2107230.0, ans=0.025 2024-08-13 09:49:31,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2107230.0, ans=0.125 2024-08-13 09:49:31,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2107230.0, ans=0.09899494936611666 2024-08-13 09:49:32,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2107230.0, ans=0.2 2024-08-13 09:49:45,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.478e+01 2.776e+01 3.061e+01 6.531e+01, threshold=5.553e+01, percent-clipped=2.0 2024-08-13 09:49:45,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2107330.0, ans=0.125 2024-08-13 09:49:48,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2107330.0, ans=0.5 2024-08-13 09:49:48,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2107330.0, ans=0.125 2024-08-13 09:49:51,070 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7850, loss[loss=0.0833, beats_loss=0.01072, ecapa_loss=0.0001708, whisper_loss=0.07087, over 19022.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01072, ecapa_loss=0.0001665, whisper_loss=0.09234, over 3870530.69 frames. ], batch size: 78, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:49:59,107 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 21 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 09:50:12,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.62 vs. limit=10.0 2024-08-13 09:50:13,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2107530.0, ans=0.125 2024-08-13 09:50:53,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2024-08-13 09:50:54,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2107830.0, ans=0.125 2024-08-13 09:51:02,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2107830.0, ans=0.0 2024-08-13 09:51:08,226 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7900, loss[loss=0.1062, beats_loss=0.01291, ecapa_loss=0.0001147, whisper_loss=0.09212, over 24722.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0108, ecapa_loss=0.0001656, whisper_loss=0.09246, over 3880599.71 frames. ], batch size: 95, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:51:17,839 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 09:51:26,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2108030.0, ans=0.2 2024-08-13 09:51:40,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2108130.0, ans=0.125 2024-08-13 09:52:05,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2108230.0, ans=0.2 2024-08-13 09:52:18,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=2108330.0, ans=0.02 2024-08-13 09:52:20,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.346e+01 2.630e+01 3.151e+01 7.356e+01, threshold=5.260e+01, percent-clipped=1.0 2024-08-13 09:52:25,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2108430.0, ans=0.2 2024-08-13 09:52:26,808 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 7950, loss[loss=0.1209, beats_loss=0.0069, ecapa_loss=0.000193, whisper_loss=0.1121, over 16671.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01077, ecapa_loss=0.0001652, whisper_loss=0.09274, over 3876626.85 frames. ], batch size: 65, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:52:43,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2108530.0, ans=0.125 2024-08-13 09:52:59,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2108630.0, ans=0.0 2024-08-13 09:53:07,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2108630.0, ans=0.0 2024-08-13 09:53:17,280 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 18 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 09:53:20,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2108730.0, ans=0.1 2024-08-13 09:53:45,243 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8000, loss[loss=0.1248, beats_loss=0.01042, ecapa_loss=0.0001214, whisper_loss=0.1132, over 17083.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01086, ecapa_loss=0.0001656, whisper_loss=0.09224, over 3858722.40 frames. ], batch size: 61, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:53:47,076 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 09:53:47,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2108930.0, ans=0.125 2024-08-13 09:53:47,839 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2024-08-13 09:53:59,161 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-13 09:54:11,796 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 09:54:28,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2109130.0, ans=0.125 2024-08-13 09:54:29,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2109130.0, ans=0.0 2024-08-13 09:54:37,528 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.240e-01 2024-08-13 09:54:46,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2109330.0, ans=0.125 2024-08-13 09:54:55,088 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 09:54:56,447 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.293e+01 2.578e+01 2.886e+01 4.471e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-13 09:55:02,806 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8050, loss[loss=0.1105, beats_loss=0.01047, ecapa_loss=0.0001637, whisper_loss=0.0984, over 21804.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001645, whisper_loss=0.0918, over 3868720.04 frames. ], batch size: 88, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:55:29,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2109530.0, ans=0.125 2024-08-13 09:55:33,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2109630.0, ans=10.0 2024-08-13 09:56:17,106 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 34 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 09:56:20,563 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8100, loss[loss=0.09469, beats_loss=0.01245, ecapa_loss=0.0001584, whisper_loss=0.08065, over 22852.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01083, ecapa_loss=0.0001638, whisper_loss=0.09279, over 3908549.89 frames. ], batch size: 93, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:56:26,717 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 24 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-13 09:56:31,753 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 09:56:41,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2110030.0, ans=0.125 2024-08-13 09:56:49,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2024-08-13 09:56:50,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2110130.0, ans=0.1 2024-08-13 09:56:59,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2110130.0, ans=0.2 2024-08-13 09:57:16,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2110230.0, ans=0.125 2024-08-13 09:57:24,102 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2024-08-13 09:57:30,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.445e+01 2.691e+01 3.022e+01 6.409e+01, threshold=5.382e+01, percent-clipped=1.0 2024-08-13 09:57:36,919 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8150, loss[loss=0.09636, beats_loss=0.0113, ecapa_loss=0.0001562, whisper_loss=0.0835, over 22761.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01085, ecapa_loss=0.0001642, whisper_loss=0.09233, over 3899733.54 frames. ], batch size: 89, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:57:47,751 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 20 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 09:57:51,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2110530.0, ans=0.1 2024-08-13 09:57:51,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2110530.0, ans=0.125 2024-08-13 09:57:52,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2110530.0, ans=0.125 2024-08-13 09:57:56,679 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-13 09:57:57,451 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2024-08-13 09:58:02,076 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-13 09:58:25,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2110730.0, ans=0.0 2024-08-13 09:58:27,021 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 09:58:31,732 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 09:58:41,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2110830.0, ans=0.125 2024-08-13 09:58:42,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2110830.0, ans=0.125 2024-08-13 09:58:54,377 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8200, loss[loss=0.08994, beats_loss=0.01295, ecapa_loss=0.0001597, whisper_loss=0.0754, over 18510.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0108, ecapa_loss=0.0001653, whisper_loss=0.09285, over 3918610.54 frames. ], batch size: 75, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:58:54,539 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-13 09:59:04,810 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 20 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 09:59:06,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2110930.0, ans=0.125 2024-08-13 09:59:08,582 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 09:59:10,020 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-13 09:59:12,386 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-08-13 09:59:22,708 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 09:59:23,222 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.59 vs. limit=22.5 2024-08-13 09:59:25,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2111130.0, ans=0.125 2024-08-13 09:59:28,670 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 09:59:38,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2111230.0, ans=0.125 2024-08-13 09:59:48,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2111230.0, ans=0.125 2024-08-13 10:00:08,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.520e+01 2.689e+01 2.972e+01 4.311e+01, threshold=5.378e+01, percent-clipped=0.0 2024-08-13 10:00:13,689 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=12.0 2024-08-13 10:00:14,737 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8250, loss[loss=0.1135, beats_loss=0.01029, ecapa_loss=0.0002041, whisper_loss=0.1012, over 22674.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01081, ecapa_loss=0.000165, whisper_loss=0.09233, over 3919458.68 frames. ], batch size: 94, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:00:17,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2111430.0, ans=0.07 2024-08-13 10:00:50,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2111630.0, ans=0.125 2024-08-13 10:01:08,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2111730.0, ans=0.0 2024-08-13 10:01:26,413 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 10:01:26,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2111830.0, ans=0.0 2024-08-13 10:01:35,583 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8300, loss[loss=0.11, beats_loss=0.009471, ecapa_loss=0.0001312, whisper_loss=0.0992, over 18601.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01084, ecapa_loss=0.0001636, whisper_loss=0.09141, over 3872441.93 frames. ], batch size: 68, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:02:08,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2112130.0, ans=0.0 2024-08-13 10:02:17,493 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 19 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-13 10:02:37,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2112330.0, ans=0.1 2024-08-13 10:02:46,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.390e+01 2.767e+01 3.084e+01 3.775e+01, threshold=5.535e+01, percent-clipped=0.0 2024-08-13 10:02:46,692 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 10:02:52,861 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8350, loss[loss=0.1132, beats_loss=0.006913, ecapa_loss=0.0001981, whisper_loss=0.1043, over 15253.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01088, ecapa_loss=0.0001636, whisper_loss=0.09115, over 3890600.79 frames. ], batch size: 59, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:03:02,964 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-13 10:03:06,598 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 10:03:08,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2112530.0, ans=0.125 2024-08-13 10:03:08,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2112530.0, ans=0.125 2024-08-13 10:03:12,895 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.91 vs. limit=22.5 2024-08-13 10:03:17,595 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-13 10:03:39,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2112730.0, ans=0.0 2024-08-13 10:03:45,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2024-08-13 10:03:48,336 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 33 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 10:03:58,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2112830.0, ans=0.125 2024-08-13 10:04:07,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2112830.0, ans=0.0 2024-08-13 10:04:10,709 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8400, loss[loss=0.1318, beats_loss=0.008719, ecapa_loss=0.0001941, whisper_loss=0.1212, over 17376.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.0001631, whisper_loss=0.0912, over 3911401.99 frames. ], batch size: 72, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:04:15,038 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 16 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 10:04:43,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2113130.0, ans=0.0 2024-08-13 10:04:46,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2113130.0, ans=0.5 2024-08-13 10:04:54,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2113130.0, ans=0.125 2024-08-13 10:05:02,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2113230.0, ans=0.1 2024-08-13 10:05:11,256 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 10:05:14,634 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 33 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 10:05:22,093 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.471e+01 2.703e+01 3.041e+01 5.042e+01, threshold=5.407e+01, percent-clipped=0.0 2024-08-13 10:05:28,245 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8450, loss[loss=0.1121, beats_loss=0.01162, ecapa_loss=0.0001709, whisper_loss=0.09876, over 22246.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01076, ecapa_loss=0.0001645, whisper_loss=0.09179, over 3889595.55 frames. ], batch size: 92, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:05:43,695 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2024-08-13 10:05:44,407 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 10:05:49,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2113530.0, ans=0.125 2024-08-13 10:05:50,778 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 16 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 10:06:04,381 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2024-08-13 10:06:09,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2113630.0, ans=0.125 2024-08-13 10:06:14,729 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-13 10:06:29,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2113730.0, ans=0.1 2024-08-13 10:06:30,241 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 28 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 10:06:48,883 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8500, loss[loss=0.09513, beats_loss=0.008876, ecapa_loss=0.0001593, whisper_loss=0.08466, over 16384.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01081, ecapa_loss=0.0001627, whisper_loss=0.09164, over 3917921.44 frames. ], batch size: 62, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:06:55,422 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-13 10:07:08,382 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2024-08-13 10:07:35,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2114130.0, ans=0.125 2024-08-13 10:07:38,576 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 10:07:53,716 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-13 10:08:04,263 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.378e+01 2.649e+01 2.972e+01 5.253e+01, threshold=5.297e+01, percent-clipped=0.0 2024-08-13 10:08:07,284 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-13 10:08:10,677 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8550, loss[loss=0.1107, beats_loss=0.01051, ecapa_loss=0.0001594, whisper_loss=0.09861, over 16835.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01082, ecapa_loss=0.0001635, whisper_loss=0.09121, over 3924335.78 frames. ], batch size: 68, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:08:12,066 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 25 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-13 10:08:13,253 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 10:08:20,140 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 10:08:20,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2114430.0, ans=0.2 2024-08-13 10:08:28,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2114530.0, ans=0.125 2024-08-13 10:08:32,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2114530.0, ans=0.0 2024-08-13 10:08:32,975 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.39 vs. limit=12.0 2024-08-13 10:08:40,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2114630.0, ans=0.035 2024-08-13 10:08:43,395 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 17 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 10:09:00,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2114730.0, ans=0.0 2024-08-13 10:09:05,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2114730.0, ans=0.125 2024-08-13 10:09:18,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2114830.0, ans=0.1 2024-08-13 10:09:20,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2114830.0, ans=0.0 2024-08-13 10:09:22,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2114830.0, ans=0.2 2024-08-13 10:09:26,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2114830.0, ans=0.125 2024-08-13 10:09:31,214 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8600, loss[loss=0.08819, beats_loss=0.01097, ecapa_loss=0.0002025, whisper_loss=0.0752, over 22337.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01071, ecapa_loss=0.0001649, whisper_loss=0.09207, over 3922531.15 frames. ], batch size: 91, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:09:34,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2114930.0, ans=0.2 2024-08-13 10:09:38,954 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 30 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-13 10:09:42,450 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 10:09:44,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2114930.0, ans=0.125 2024-08-13 10:09:58,081 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 26 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 10:10:10,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2115130.0, ans=0.2 2024-08-13 10:10:42,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2115330.0, ans=0.0 2024-08-13 10:10:45,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.403e+01 2.760e+01 3.057e+01 6.734e+01, threshold=5.520e+01, percent-clipped=3.0 2024-08-13 10:10:51,404 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8650, loss[loss=0.08689, beats_loss=0.01073, ecapa_loss=0.0001448, whisper_loss=0.07471, over 18264.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0107, ecapa_loss=0.0001656, whisper_loss=0.09166, over 3927967.39 frames. ], batch size: 73, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:11:17,685 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=12.0 2024-08-13 10:11:24,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2115630.0, ans=0.0 2024-08-13 10:11:45,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2115730.0, ans=0.2 2024-08-13 10:12:02,572 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 10:12:04,216 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 23 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-13 10:12:08,311 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8700, loss[loss=0.1127, beats_loss=0.01054, ecapa_loss=0.000167, whisper_loss=0.1005, over 21913.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001647, whisper_loss=0.09094, over 3909895.87 frames. ], batch size: 88, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:12:13,006 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2024-08-13 10:12:20,826 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 10:12:28,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2116030.0, ans=0.2 2024-08-13 10:12:34,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2116030.0, ans=0.1 2024-08-13 10:13:13,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2116330.0, ans=0.125 2024-08-13 10:13:15,332 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-13 10:13:24,105 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.443e+01 2.656e+01 3.130e+01 5.733e+01, threshold=5.311e+01, percent-clipped=2.0 2024-08-13 10:13:27,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2116330.0, ans=0.2 2024-08-13 10:13:30,120 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8750, loss[loss=0.09301, beats_loss=0.01044, ecapa_loss=0.0001881, whisper_loss=0.0807, over 17705.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001646, whisper_loss=0.09168, over 3867691.45 frames. ], batch size: 73, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:13:31,843 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 10:13:44,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2116530.0, ans=0.125 2024-08-13 10:13:55,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2116530.0, ans=0.2 2024-08-13 10:13:56,490 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 10:14:00,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2116630.0, ans=0.125 2024-08-13 10:14:04,449 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 32 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-13 10:14:14,279 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=15.0 2024-08-13 10:14:29,345 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-08-13 10:14:49,878 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8800, loss[loss=0.1206, beats_loss=0.009874, ecapa_loss=0.0001516, whisper_loss=0.1092, over 22875.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01078, ecapa_loss=0.0001638, whisper_loss=0.0921, over 3902908.44 frames. ], batch size: 87, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:14:57,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2116930.0, ans=0.125 2024-08-13 10:15:03,425 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=12.0 2024-08-13 10:15:10,452 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.629e+00 2024-08-13 10:15:20,213 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 10:15:40,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2117230.0, ans=0.0 2024-08-13 10:15:50,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2117230.0, ans=0.1 2024-08-13 10:15:51,760 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 28 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 10:16:06,047 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.410e+01 2.636e+01 2.976e+01 1.522e+02, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 10:16:06,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2117330.0, ans=0.125 2024-08-13 10:16:13,299 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8850, loss[loss=0.09176, beats_loss=0.01386, ecapa_loss=0.0001348, whisper_loss=0.07656, over 13945.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01071, ecapa_loss=0.0001639, whisper_loss=0.09225, over 3900604.67 frames. ], batch size: 54, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:16:28,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2117530.0, ans=0.125 2024-08-13 10:16:40,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2117530.0, ans=0.0 2024-08-13 10:16:58,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-08-13 10:17:14,759 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 10:17:16,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2024-08-13 10:17:21,667 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 10:17:25,018 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 10:17:28,246 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:17:34,243 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8900, loss[loss=0.1142, beats_loss=0.009475, ecapa_loss=0.0001418, whisper_loss=0.1033, over 17899.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01078, ecapa_loss=0.0001627, whisper_loss=0.09155, over 3894233.64 frames. ], batch size: 67, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:17:36,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2117930.0, ans=0.125 2024-08-13 10:17:56,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2118030.0, ans=0.125 2024-08-13 10:18:01,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2118030.0, ans=0.125 2024-08-13 10:18:35,223 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 10:18:48,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.342e+01 2.664e+01 2.910e+01 6.216e+01, threshold=5.329e+01, percent-clipped=1.0 2024-08-13 10:18:51,708 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2024-08-13 10:18:54,501 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 8950, loss[loss=0.1018, beats_loss=0.01008, ecapa_loss=0.0001536, whisper_loss=0.09015, over 17141.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.0001624, whisper_loss=0.09124, over 3874159.24 frames. ], batch size: 67, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:19:02,190 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2024-08-13 10:19:29,732 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 31 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 10:19:31,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2118630.0, ans=0.0 2024-08-13 10:19:40,234 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-13 10:19:54,270 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-13 10:19:55,578 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-13 10:20:10,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2118830.0, ans=0.1 2024-08-13 10:20:11,305 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=15.0 2024-08-13 10:20:13,259 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9000, loss[loss=0.1091, beats_loss=0.009599, ecapa_loss=0.0001537, whisper_loss=0.09795, over 19404.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0108, ecapa_loss=0.0001631, whisper_loss=0.09192, over 3899976.89 frames. ], batch size: 75, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:20:13,259 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 10:20:54,968 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005617, whisper_loss=0.2479, over 922467.00 frames. 2024-08-13 10:21:13,536 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on SV_voxceleb1: loss=0.004578, beats_loss=0, ecapa_loss=0.0004578, whisper_loss=0, over 939242.00 frames. 2024-08-13 10:22:46,429 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9178, 3.1316, 3.2439, 2.9293], device='cuda:1') 2024-08-13 10:23:02,625 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on AT_audioset: loss=0.02381, beats_loss=0.02381, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 10:23:02,628 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-13 10:23:17,129 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 10:23:26,069 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.74 vs. limit=12.0 2024-08-13 10:23:50,311 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 10:23:50,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2119230.0, ans=0.2 2024-08-13 10:23:58,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2119230.0, ans=0.0 2024-08-13 10:24:07,750 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 10:24:18,370 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.388e+01 2.773e+01 3.157e+01 5.459e+01, threshold=5.546e+01, percent-clipped=1.0 2024-08-13 10:24:20,857 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.46 vs. limit=15.0 2024-08-13 10:24:24,657 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9050, loss[loss=0.1214, beats_loss=0.009971, ecapa_loss=0.0001682, whisper_loss=0.1097, over 18354.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01072, ecapa_loss=0.0001636, whisper_loss=0.09251, over 3856479.83 frames. ], batch size: 70, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:24:31,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2119430.0, ans=0.125 2024-08-13 10:25:44,360 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9100, loss[loss=0.1018, beats_loss=0.01002, ecapa_loss=0.0001449, whisper_loss=0.09038, over 19447.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01068, ecapa_loss=0.0001645, whisper_loss=0.09337, over 3875135.77 frames. ], batch size: 76, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:26:06,233 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 21 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-13 10:26:16,132 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 10:26:28,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2120130.0, ans=0.0 2024-08-13 10:26:38,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2120230.0, ans=0.125 2024-08-13 10:26:48,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2120230.0, ans=0.125 2024-08-13 10:26:51,615 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-08-13 10:26:57,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2120330.0, ans=0.0 2024-08-13 10:27:02,133 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:27:02,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.361e+01 2.637e+01 2.940e+01 4.647e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-13 10:27:04,490 INFO [train_multi_KD3.py:844] (1/4) A total of 53 cuts. 17 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-13 10:27:10,229 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9150, loss[loss=0.1149, beats_loss=0.01224, ecapa_loss=0.000132, whisper_loss=0.1013, over 16098.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01074, ecapa_loss=0.0001647, whisper_loss=0.09233, over 3865784.07 frames. ], batch size: 62, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:27:27,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2120530.0, ans=0.125 2024-08-13 10:27:29,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2120530.0, ans=0.1 2024-08-13 10:27:39,274 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.662e+01 2024-08-13 10:27:44,796 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.67 vs. limit=15.0 2024-08-13 10:27:50,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2120630.0, ans=10.0 2024-08-13 10:27:58,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2120730.0, ans=0.0 2024-08-13 10:28:22,837 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 16 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 10:28:25,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2120830.0, ans=0.125 2024-08-13 10:28:29,889 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9200, loss[loss=0.1082, beats_loss=0.009692, ecapa_loss=0.0001677, whisper_loss=0.09685, over 22002.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01073, ecapa_loss=0.0001651, whisper_loss=0.09239, over 3899467.49 frames. ], batch size: 88, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:28:46,203 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 10:28:52,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2121030.0, ans=0.2 2024-08-13 10:28:59,081 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-13 10:29:17,841 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 10:29:31,968 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 23 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 10:29:39,460 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2024-08-13 10:29:41,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.412e+01 2.586e+01 2.944e+01 1.076e+02, threshold=5.171e+01, percent-clipped=1.0 2024-08-13 10:29:48,971 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9250, loss[loss=0.1093, beats_loss=0.009898, ecapa_loss=0.0001722, whisper_loss=0.09766, over 21736.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01074, ecapa_loss=0.0001649, whisper_loss=0.09252, over 3907735.52 frames. ], batch size: 86, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:29:51,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2121430.0, ans=0.0 2024-08-13 10:29:52,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2121430.0, ans=0.125 2024-08-13 10:30:06,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2121530.0, ans=0.125 2024-08-13 10:30:08,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2121530.0, ans=0.0 2024-08-13 10:30:32,382 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 10:30:34,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2121630.0, ans=0.125 2024-08-13 10:31:04,427 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 13 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 10:31:13,835 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9300, loss[loss=0.09401, beats_loss=0.01202, ecapa_loss=0.0002099, whisper_loss=0.07989, over 19246.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01079, ecapa_loss=0.0001662, whisper_loss=0.09128, over 3919580.38 frames. ], batch size: 84, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:31:18,915 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 10:31:28,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2122030.0, ans=0.125 2024-08-13 10:31:37,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-13 10:31:59,016 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 10:32:27,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.057e+01 2.387e+01 2.545e+01 2.935e+01 6.659e+01, threshold=5.090e+01, percent-clipped=1.0 2024-08-13 10:32:28,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2122330.0, ans=0.2 2024-08-13 10:32:34,605 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9350, loss[loss=0.09008, beats_loss=0.01061, ecapa_loss=0.0001531, whisper_loss=0.07794, over 17771.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001657, whisper_loss=0.09092, over 3882890.14 frames. ], batch size: 71, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:32:44,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2122430.0, ans=0.125 2024-08-13 10:32:48,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2122430.0, ans=0.0 2024-08-13 10:33:07,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2122630.0, ans=0.1 2024-08-13 10:33:09,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2122630.0, ans=0.2 2024-08-13 10:33:27,097 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 20 from LS+wenet, 23 from Vox, 50 fro AS 2024-08-13 10:33:40,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2122830.0, ans=0.2 2024-08-13 10:33:44,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2122830.0, ans=0.0 2024-08-13 10:33:48,298 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 10:33:55,889 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9400, loss[loss=0.1017, beats_loss=0.01227, ecapa_loss=0.0001726, whisper_loss=0.08771, over 22223.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01086, ecapa_loss=0.0001664, whisper_loss=0.0907, over 3877658.92 frames. ], batch size: 91, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:33:56,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2122930.0, ans=0.0 2024-08-13 10:34:18,790 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 32 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-13 10:34:35,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2123130.0, ans=0.125 2024-08-13 10:34:45,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2123230.0, ans=0.125 2024-08-13 10:34:51,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2123230.0, ans=0.0 2024-08-13 10:35:11,318 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.356e+01 2.664e+01 2.978e+01 5.324e+01, threshold=5.328e+01, percent-clipped=1.0 2024-08-13 10:35:17,189 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9450, loss[loss=0.11, beats_loss=0.01044, ecapa_loss=0.0001922, whisper_loss=0.09767, over 18103.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0109, ecapa_loss=0.0001652, whisper_loss=0.09084, over 3906074.37 frames. ], batch size: 73, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:35:19,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2123430.0, ans=0.125 2024-08-13 10:35:22,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2123430.0, ans=0.125 2024-08-13 10:35:27,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2123430.0, ans=0.125 2024-08-13 10:35:37,858 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 10:36:25,592 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-13 10:36:33,069 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 19 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-13 10:36:41,202 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 27 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-13 10:36:42,318 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9500, loss[loss=0.1065, beats_loss=0.01153, ecapa_loss=0.0001364, whisper_loss=0.09361, over 21260.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01088, ecapa_loss=0.0001659, whisper_loss=0.09094, over 3950509.88 frames. ], batch size: 82, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:36:56,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2123930.0, ans=0.125 2024-08-13 10:37:03,854 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 26 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 10:37:20,733 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 10:37:31,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2124130.0, ans=0.2 2024-08-13 10:37:46,607 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2024-08-13 10:38:13,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=2124330.0, ans=0.1 2024-08-13 10:38:17,723 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-13 10:38:22,514 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 29 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 10:38:25,280 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.632e+01 2024-08-13 10:38:25,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2124330.0, ans=0.0 2024-08-13 10:38:25,731 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2024-08-13 10:38:28,719 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.383e+01 2.725e+01 3.152e+01 1.098e+02, threshold=5.450e+01, percent-clipped=1.0 2024-08-13 10:38:38,692 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9550, loss[loss=0.09981, beats_loss=0.01333, ecapa_loss=0.0001387, whisper_loss=0.08509, over 16270.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01089, ecapa_loss=0.0001661, whisper_loss=0.09058, over 3918222.61 frames. ], batch size: 63, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:39:05,295 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 23 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 10:39:15,230 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 10:39:18,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2124530.0, ans=0.125 2024-08-13 10:39:23,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2124630.0, ans=0.125 2024-08-13 10:39:35,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2124630.0, ans=0.125 2024-08-13 10:39:35,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2124630.0, ans=0.1 2024-08-13 10:39:37,857 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 10:40:13,088 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 10:40:17,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2124830.0, ans=0.0 2024-08-13 10:40:27,937 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9600, loss[loss=0.1203, beats_loss=0.009344, ecapa_loss=0.0001584, whisper_loss=0.1094, over 17862.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01086, ecapa_loss=0.000166, whisper_loss=0.09032, over 3908055.60 frames. ], batch size: 68, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:41:02,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2125130.0, ans=0.2 2024-08-13 10:41:14,119 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 27 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 10:41:21,962 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 12 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 10:41:31,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2125230.0, ans=0.125 2024-08-13 10:41:47,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.442e+01 2.705e+01 2.957e+01 4.182e+01, threshold=5.411e+01, percent-clipped=0.0 2024-08-13 10:41:55,720 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9650, loss[loss=0.09158, beats_loss=0.009411, ecapa_loss=0.0002257, whisper_loss=0.07991, over 16861.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001673, whisper_loss=0.09085, over 3871909.99 frames. ], batch size: 72, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:41:58,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2125430.0, ans=0.07 2024-08-13 10:42:02,479 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 10:42:15,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2125530.0, ans=0.1 2024-08-13 10:42:17,428 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 29 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 10:42:21,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2125530.0, ans=0.125 2024-08-13 10:42:32,072 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2024-08-13 10:42:47,765 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 28 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 10:43:00,913 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-13 10:43:04,607 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-13 10:43:16,868 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 10:43:27,831 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9700, loss[loss=0.1088, beats_loss=0.009623, ecapa_loss=0.0002174, whisper_loss=0.097, over 15318.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01077, ecapa_loss=0.0001669, whisper_loss=0.09109, over 3885807.14 frames. ], batch size: 64, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:43:28,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2125930.0, ans=0.0 2024-08-13 10:44:26,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2126230.0, ans=0.125 2024-08-13 10:45:07,401 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 19 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 10:45:09,634 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.435e+01 2.595e+01 3.006e+01 3.939e+01, threshold=5.189e+01, percent-clipped=0.0 2024-08-13 10:45:16,834 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9750, loss[loss=0.1054, beats_loss=0.01065, ecapa_loss=0.0001749, whisper_loss=0.09302, over 16063.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01071, ecapa_loss=0.0001669, whisper_loss=0.09126, over 3856847.75 frames. ], batch size: 66, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:45:26,262 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 28 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 10:45:45,374 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 23 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-13 10:45:55,731 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-13 10:45:57,024 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 10:46:14,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2126630.0, ans=0.0 2024-08-13 10:46:43,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2126730.0, ans=0.125 2024-08-13 10:46:43,349 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.39 vs. limit=22.5 2024-08-13 10:46:49,132 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 10:47:03,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2126830.0, ans=0.05 2024-08-13 10:47:07,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2126830.0, ans=0.125 2024-08-13 10:47:12,455 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9800, loss[loss=0.09991, beats_loss=0.01001, ecapa_loss=0.0001869, whisper_loss=0.08804, over 17215.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001669, whisper_loss=0.09088, over 3856401.94 frames. ], batch size: 69, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:47:12,534 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 10:47:15,672 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 38 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 10:47:23,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2126930.0, ans=0.0 2024-08-13 10:47:23,976 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-13 10:47:25,511 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 10:47:45,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2127030.0, ans=0.125 2024-08-13 10:47:45,517 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.81 vs. limit=10.0 2024-08-13 10:48:00,505 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:48:08,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2127130.0, ans=0.125 2024-08-13 10:48:22,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2024-08-13 10:48:22,517 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.57 vs. limit=15.0 2024-08-13 10:48:40,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2127230.0, ans=0.2 2024-08-13 10:49:04,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.353e+01 2.628e+01 3.072e+01 7.221e+01, threshold=5.255e+01, percent-clipped=1.0 2024-08-13 10:49:12,309 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9850, loss[loss=0.1053, beats_loss=0.01272, ecapa_loss=0.0001648, whisper_loss=0.09096, over 21425.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01087, ecapa_loss=0.0001655, whisper_loss=0.09105, over 3895524.03 frames. ], batch size: 87, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:49:46,260 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 10:49:50,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2127530.0, ans=0.125 2024-08-13 10:50:14,784 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 10:50:21,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2127730.0, ans=0.125 2024-08-13 10:50:26,610 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 10:50:39,566 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 10:50:40,124 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-13 10:50:40,337 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.43 vs. limit=22.5 2024-08-13 10:50:47,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2127830.0, ans=0.125 2024-08-13 10:50:54,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2127830.0, ans=0.125 2024-08-13 10:50:55,934 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 10:51:00,184 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 10:51:05,646 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9900, loss[loss=0.1022, beats_loss=0.01302, ecapa_loss=0.0001346, whisper_loss=0.08781, over 21425.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001638, whisper_loss=0.09101, over 3857780.50 frames. ], batch size: 87, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:51:14,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2127930.0, ans=0.2 2024-08-13 10:51:23,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2128030.0, ans=0.125 2024-08-13 10:52:12,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2128330.0, ans=0.125 2024-08-13 10:52:18,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.402e+01 2.725e+01 3.042e+01 4.728e+01, threshold=5.451e+01, percent-clipped=0.0 2024-08-13 10:52:23,187 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 9950, loss[loss=0.09143, beats_loss=0.01193, ecapa_loss=0.0001721, whisper_loss=0.07779, over 20990.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01095, ecapa_loss=0.0001642, whisper_loss=0.09061, over 3875845.71 frames. ], batch size: 87, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:52:52,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2128530.0, ans=0.125 2024-08-13 10:53:18,162 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 17 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-13 10:53:24,211 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 21 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-13 10:53:36,492 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 10:53:42,744 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10000, loss[loss=0.1119, beats_loss=0.009214, ecapa_loss=0.0001633, whisper_loss=0.101, over 17152.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01088, ecapa_loss=0.0001638, whisper_loss=0.09136, over 3875747.23 frames. ], batch size: 64, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:53:44,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2128930.0, ans=0.0 2024-08-13 10:54:01,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2129030.0, ans=0.125 2024-08-13 10:54:04,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2129030.0, ans=0.1 2024-08-13 10:54:37,765 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 10:54:42,064 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=12.0 2024-08-13 10:54:46,504 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2024-08-13 10:54:53,634 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.85 vs. limit=22.5 2024-08-13 10:54:54,515 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 32 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 10:54:54,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2129330.0, ans=0.04949747468305833 2024-08-13 10:54:57,443 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.402e+01 2.704e+01 2.977e+01 9.053e+01, threshold=5.409e+01, percent-clipped=1.0 2024-08-13 10:54:59,407 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 10:55:02,762 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10050, loss[loss=0.08453, beats_loss=0.008542, ecapa_loss=0.0001913, whisper_loss=0.07408, over 14483.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01081, ecapa_loss=0.0001638, whisper_loss=0.09215, over 3894973.18 frames. ], batch size: 57, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:55:04,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2129430.0, ans=0.125 2024-08-13 10:55:37,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2129630.0, ans=0.5 2024-08-13 10:55:39,055 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-13 10:55:42,621 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 10:55:47,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2129630.0, ans=0.1 2024-08-13 10:56:00,133 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 10:56:11,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2129830.0, ans=0.025 2024-08-13 10:56:25,410 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10100, loss[loss=0.1118, beats_loss=0.01106, ecapa_loss=0.0001719, whisper_loss=0.09899, over 23123.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01083, ecapa_loss=0.0001638, whisper_loss=0.09156, over 3889298.72 frames. ], batch size: 92, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:56:32,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2129930.0, ans=0.125 2024-08-13 10:56:32,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2129930.0, ans=0.125 2024-08-13 10:57:05,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2130130.0, ans=0.0 2024-08-13 10:57:20,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2130230.0, ans=0.125 2024-08-13 10:57:31,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2130330.0, ans=0.1 2024-08-13 10:57:41,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=15.0 2024-08-13 10:57:42,156 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.392e+01 2.656e+01 2.956e+01 4.246e+01, threshold=5.312e+01, percent-clipped=0.0 2024-08-13 10:57:42,378 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-13 10:57:46,725 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10150, loss[loss=0.1249, beats_loss=0.008415, ecapa_loss=0.0001583, whisper_loss=0.1149, over 23129.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01073, ecapa_loss=0.0001663, whisper_loss=0.09185, over 3883119.22 frames. ], batch size: 88, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:57:50,290 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 10:57:55,032 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.55 vs. limit=22.5 2024-08-13 10:58:03,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2130530.0, ans=0.0 2024-08-13 10:58:14,129 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 10:58:19,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2130630.0, ans=0.0 2024-08-13 10:58:35,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2130730.0, ans=0.125 2024-08-13 10:58:35,517 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-13 10:58:58,386 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 10:58:58,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2130830.0, ans=0.2 2024-08-13 10:59:06,768 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10200, loss[loss=0.1318, beats_loss=0.00806, ecapa_loss=0.0001835, whisper_loss=0.1219, over 18900.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0107, ecapa_loss=0.0001658, whisper_loss=0.09287, over 3904611.03 frames. ], batch size: 74, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:59:11,895 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 10:59:29,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2131030.0, ans=0.125 2024-08-13 10:59:31,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2131030.0, ans=0.125 2024-08-13 10:59:36,326 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-13 10:59:40,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2131130.0, ans=0.0 2024-08-13 10:59:40,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2131130.0, ans=0.125 2024-08-13 10:59:43,641 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 10:59:43,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2131130.0, ans=0.125 2024-08-13 10:59:55,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2131230.0, ans=0.0 2024-08-13 10:59:56,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2131230.0, ans=0.0 2024-08-13 11:00:05,739 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 11:00:22,281 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.425e+01 2.688e+01 3.008e+01 5.255e+01, threshold=5.377e+01, percent-clipped=0.0 2024-08-13 11:00:27,140 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10250, loss[loss=0.1107, beats_loss=0.01115, ecapa_loss=0.0001353, whisper_loss=0.09824, over 22616.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.000166, whisper_loss=0.09198, over 3926361.49 frames. ], batch size: 88, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:00:47,982 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 11:01:10,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2131630.0, ans=0.05 2024-08-13 11:01:25,831 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2024-08-13 11:01:26,586 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 18 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 11:01:36,623 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 29 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 11:01:49,434 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10300, loss[loss=0.124, beats_loss=0.01096, ecapa_loss=0.0001531, whisper_loss=0.1115, over 15111.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01082, ecapa_loss=0.000166, whisper_loss=0.09154, over 3882637.02 frames. ], batch size: 58, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:01:56,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=2131930.0, ans=15.0 2024-08-13 11:02:05,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.73 vs. limit=15.0 2024-08-13 11:02:29,122 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 18 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 11:02:37,523 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-13 11:02:40,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2132230.0, ans=0.125 2024-08-13 11:02:58,078 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2024-08-13 11:03:01,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2024-08-13 11:03:03,606 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.417e+01 2.741e+01 3.040e+01 4.375e+02, threshold=5.481e+01, percent-clipped=2.0 2024-08-13 11:03:07,869 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10350, loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001883, whisper_loss=0.09013, over 21316.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01078, ecapa_loss=0.0001669, whisper_loss=0.09177, over 3901410.35 frames. ], batch size: 88, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:03:33,738 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.85 vs. limit=10.0 2024-08-13 11:03:34,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2132530.0, ans=0.125 2024-08-13 11:03:36,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2132530.0, ans=0.1 2024-08-13 11:03:45,765 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2024-08-13 11:03:50,198 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 20 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-13 11:04:06,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2132730.0, ans=0.0 2024-08-13 11:04:08,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2132730.0, ans=0.125 2024-08-13 11:04:24,895 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10400, loss[loss=0.103, beats_loss=0.01146, ecapa_loss=0.0001578, whisper_loss=0.09, over 19655.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0108, ecapa_loss=0.0001658, whisper_loss=0.09125, over 3868489.00 frames. ], batch size: 79, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:04:25,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2132930.0, ans=0.125 2024-08-13 11:04:46,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2133030.0, ans=0.1 2024-08-13 11:04:55,076 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 11:05:01,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2133130.0, ans=0.1 2024-08-13 11:05:10,169 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 11:05:10,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2133230.0, ans=0.0 2024-08-13 11:05:28,327 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 11:05:37,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.409e+01 2.723e+01 2.969e+01 5.956e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-13 11:05:38,116 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 23 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-13 11:05:39,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2133330.0, ans=0.2 2024-08-13 11:05:42,395 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10450, loss[loss=0.1085, beats_loss=0.01115, ecapa_loss=0.0001631, whisper_loss=0.09574, over 21184.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.0001655, whisper_loss=0.09096, over 3877880.45 frames. ], batch size: 83, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:05:45,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2133430.0, ans=0.2 2024-08-13 11:06:06,587 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 24 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-13 11:06:08,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2133530.0, ans=0.125 2024-08-13 11:06:12,132 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=12.0 2024-08-13 11:06:24,138 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2024-08-13 11:06:48,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2133830.0, ans=0.125 2024-08-13 11:06:58,829 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10500, loss[loss=0.09602, beats_loss=0.01283, ecapa_loss=0.0001594, whisper_loss=0.0816, over 22089.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01085, ecapa_loss=0.0001662, whisper_loss=0.09061, over 3868834.02 frames. ], batch size: 89, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:07:01,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2133930.0, ans=0.025 2024-08-13 11:07:03,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2133930.0, ans=0.125 2024-08-13 11:07:09,950 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=22.5 2024-08-13 11:07:11,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2133930.0, ans=0.1 2024-08-13 11:07:22,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=2134030.0, ans=10.0 2024-08-13 11:07:27,282 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.40 vs. limit=15.0 2024-08-13 11:07:31,136 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 19 from LS+wenet, 24 from Vox, 52 fro AS 2024-08-13 11:07:34,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2134130.0, ans=0.125 2024-08-13 11:07:41,767 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 32 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 11:07:42,301 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2024-08-13 11:07:57,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2134230.0, ans=0.0 2024-08-13 11:08:12,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.440e+01 2.652e+01 2.992e+01 8.819e+01, threshold=5.304e+01, percent-clipped=1.0 2024-08-13 11:08:12,829 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 16 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-13 11:08:17,217 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10550, loss[loss=0.107, beats_loss=0.009792, ecapa_loss=0.0001385, whisper_loss=0.09586, over 23947.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01081, ecapa_loss=0.0001654, whisper_loss=0.09133, over 3882830.23 frames. ], batch size: 91, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:08:28,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2134430.0, ans=0.1 2024-08-13 11:08:31,470 INFO [train_multi_KD3.py:844] (1/4) A total of 57 cuts. 21 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 11:08:31,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2134530.0, ans=0.125 2024-08-13 11:08:49,271 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.24 vs. limit=15.0 2024-08-13 11:09:24,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2134830.0, ans=0.125 2024-08-13 11:09:26,093 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-13 11:09:32,363 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2134830.0, ans=0.2 2024-08-13 11:09:38,568 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10600, loss[loss=0.09599, beats_loss=0.009859, ecapa_loss=0.0001988, whisper_loss=0.08414, over 16978.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01074, ecapa_loss=0.0001657, whisper_loss=0.0914, over 3872454.19 frames. ], batch size: 70, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:09:50,869 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 11:10:11,871 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 11:10:32,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2135230.0, ans=0.125 2024-08-13 11:10:34,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2135230.0, ans=0.125 2024-08-13 11:10:40,126 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 11:10:43,109 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 11:10:46,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2135330.0, ans=0.0 2024-08-13 11:10:52,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.455e+01 2.918e+01 3.137e+01 4.464e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-13 11:10:57,057 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10650, loss[loss=0.09836, beats_loss=0.01035, ecapa_loss=0.0001521, whisper_loss=0.08649, over 17439.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01075, ecapa_loss=0.0001646, whisper_loss=0.09173, over 3882878.12 frames. ], batch size: 68, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:11:21,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2135530.0, ans=0.125 2024-08-13 11:11:21,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2135530.0, ans=0.1 2024-08-13 11:11:26,184 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 11:11:30,010 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-13 11:11:42,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2135630.0, ans=0.125 2024-08-13 11:11:49,981 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.85 vs. limit=15.0 2024-08-13 11:11:58,014 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 36 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 11:12:06,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2135830.0, ans=0.125 2024-08-13 11:12:15,106 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10700, loss[loss=0.07865, beats_loss=0.01264, ecapa_loss=0.0001524, whisper_loss=0.06448, over 14112.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01091, ecapa_loss=0.0001625, whisper_loss=0.0914, over 3918243.82 frames. ], batch size: 57, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:12:32,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2136030.0, ans=0.125 2024-08-13 11:12:40,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2136030.0, ans=0.0 2024-08-13 11:13:01,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2136230.0, ans=0.1 2024-08-13 11:13:06,096 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2024-08-13 11:13:06,824 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-13 11:13:07,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2136230.0, ans=0.125 2024-08-13 11:13:16,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2136330.0, ans=0.1 2024-08-13 11:13:19,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2136330.0, ans=0.0 2024-08-13 11:13:25,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2136330.0, ans=0.125 2024-08-13 11:13:26,388 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.457e+01 2.823e+01 3.286e+01 3.691e+02, threshold=5.645e+01, percent-clipped=1.0 2024-08-13 11:13:26,577 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 11:13:31,246 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10750, loss[loss=0.114, beats_loss=0.01146, ecapa_loss=0.0001154, whisper_loss=0.1013, over 14573.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001632, whisper_loss=0.09168, over 3919581.77 frames. ], batch size: 55, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:13:42,506 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.0 2024-08-13 11:13:45,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2136530.0, ans=0.125 2024-08-13 11:13:48,522 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 37 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 11:14:08,986 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 11:14:21,266 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-13 11:14:23,016 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 11:14:29,167 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=12.0 2024-08-13 11:14:30,262 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 11:14:47,076 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10800, loss[loss=0.09893, beats_loss=0.009811, ecapa_loss=0.0001404, whisper_loss=0.08772, over 21481.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001642, whisper_loss=0.09165, over 3921536.00 frames. ], batch size: 83, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:15:10,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2137030.0, ans=0.0 2024-08-13 11:15:23,469 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 11:15:24,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-13 11:15:25,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2137130.0, ans=0.0 2024-08-13 11:15:28,867 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 21 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 11:15:46,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2137330.0, ans=0.125 2024-08-13 11:15:48,554 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-13 11:15:49,726 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.60 vs. limit=22.5 2024-08-13 11:15:56,583 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.544e+01 2.753e+01 3.369e+01 1.648e+02, threshold=5.506e+01, percent-clipped=4.0 2024-08-13 11:16:00,882 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10850, loss[loss=0.1129, beats_loss=0.01049, ecapa_loss=0.0001466, whisper_loss=0.1009, over 21111.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01092, ecapa_loss=0.000165, whisper_loss=0.09178, over 3971867.33 frames. ], batch size: 80, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:16:13,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2137430.0, ans=10.0 2024-08-13 11:16:17,833 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 38 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-13 11:16:34,871 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=15.0 2024-08-13 11:16:46,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2137730.0, ans=0.09899494936611666 2024-08-13 11:16:53,478 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 11:17:13,121 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 25 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 11:17:15,051 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 11:17:16,510 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10900, loss[loss=0.1204, beats_loss=0.009197, ecapa_loss=0.0001543, whisper_loss=0.1097, over 22110.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01086, ecapa_loss=0.0001649, whisper_loss=0.09292, over 3983019.39 frames. ], batch size: 87, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:17:21,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2137930.0, ans=0.125 2024-08-13 11:17:24,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2137930.0, ans=0.125 2024-08-13 11:17:25,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2137930.0, ans=0.2 2024-08-13 11:17:29,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2137930.0, ans=0.2 2024-08-13 11:17:44,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2138130.0, ans=0.0 2024-08-13 11:17:47,103 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2024-08-13 11:17:50,458 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 11:17:51,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2138130.0, ans=0.04949747468305833 2024-08-13 11:18:20,800 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 19 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-13 11:18:22,766 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2024-08-13 11:18:26,194 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.488e+01 2.800e+01 3.283e+01 5.415e+01, threshold=5.600e+01, percent-clipped=0.0 2024-08-13 11:18:27,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2138330.0, ans=0.125 2024-08-13 11:18:30,680 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 10950, loss[loss=0.1132, beats_loss=0.01034, ecapa_loss=0.0001593, whisper_loss=0.1012, over 22852.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01084, ecapa_loss=0.0001652, whisper_loss=0.09236, over 3973308.97 frames. ], batch size: 88, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:18:31,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2138430.0, ans=0.125 2024-08-13 11:18:36,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2138430.0, ans=0.125 2024-08-13 11:18:40,440 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 12 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 11:18:40,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-13 11:18:42,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2138430.0, ans=0.07 2024-08-13 11:19:25,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2138730.0, ans=0.125 2024-08-13 11:19:28,603 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.24 vs. limit=22.5 2024-08-13 11:19:30,130 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2024-08-13 11:19:34,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2138830.0, ans=0.1 2024-08-13 11:19:42,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2138830.0, ans=0.1 2024-08-13 11:19:48,416 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11000, loss[loss=0.1045, beats_loss=0.007913, ecapa_loss=0.000194, whisper_loss=0.09469, over 22027.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01082, ecapa_loss=0.0001663, whisper_loss=0.09243, over 3963961.63 frames. ], batch size: 88, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:19:50,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2138930.0, ans=0.2 2024-08-13 11:20:08,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.11 vs. limit=15.0 2024-08-13 11:20:13,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2139030.0, ans=0.125 2024-08-13 11:20:25,743 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.623e+01 2024-08-13 11:20:31,065 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 11:20:58,076 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.481e+01 2.729e+01 3.286e+01 1.330e+02, threshold=5.458e+01, percent-clipped=4.0 2024-08-13 11:21:03,173 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11050, loss[loss=0.1061, beats_loss=0.01051, ecapa_loss=0.0001733, whisper_loss=0.09382, over 22716.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01085, ecapa_loss=0.0001654, whisper_loss=0.09122, over 3926545.37 frames. ], batch size: 90, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:21:05,082 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-13 11:21:06,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2139430.0, ans=0.2 2024-08-13 11:21:13,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2139430.0, ans=0.0 2024-08-13 11:21:15,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2139430.0, ans=0.125 2024-08-13 11:21:23,245 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 11:21:25,470 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2024-08-13 11:21:34,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2139530.0, ans=0.0 2024-08-13 11:21:38,381 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 21 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 11:21:45,790 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-13 11:21:47,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2139630.0, ans=0.0 2024-08-13 11:21:49,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2139630.0, ans=0.09899494936611666 2024-08-13 11:22:11,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2139730.0, ans=0.0 2024-08-13 11:22:38,438 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11100, loss[loss=0.1068, beats_loss=0.008326, ecapa_loss=0.0002207, whisper_loss=0.09627, over 14826.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01088, ecapa_loss=0.0001646, whisper_loss=0.09127, over 3925106.70 frames. ], batch size: 61, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:22:42,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2139930.0, ans=0.125 2024-08-13 11:22:51,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2139930.0, ans=0.125 2024-08-13 11:23:03,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2140030.0, ans=0.0 2024-08-13 11:23:28,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2140130.0, ans=0.125 2024-08-13 11:23:53,955 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 11:24:02,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2140330.0, ans=0.125 2024-08-13 11:24:08,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2140330.0, ans=0.1 2024-08-13 11:24:11,133 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.487e+01 2.717e+01 3.069e+01 5.884e+01, threshold=5.434e+01, percent-clipped=1.0 2024-08-13 11:24:15,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2140430.0, ans=0.2 2024-08-13 11:24:16,380 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11150, loss[loss=0.1036, beats_loss=0.009839, ecapa_loss=0.0001426, whisper_loss=0.09229, over 23418.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001646, whisper_loss=0.09154, over 3948360.81 frames. ], batch size: 91, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:24:22,997 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:24:25,582 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 24 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-13 11:24:35,514 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 32 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 11:24:42,376 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 11:24:47,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2140630.0, ans=0.0 2024-08-13 11:25:13,653 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 25 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 11:25:30,064 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11200, loss[loss=0.07933, beats_loss=0.01387, ecapa_loss=0.0001224, whisper_loss=0.06423, over 22153.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01083, ecapa_loss=0.0001642, whisper_loss=0.0916, over 3932737.62 frames. ], batch size: 91, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:25:32,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2140930.0, ans=0.125 2024-08-13 11:25:37,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2140930.0, ans=0.125 2024-08-13 11:25:50,502 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 19 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-13 11:26:18,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2141230.0, ans=6.0 2024-08-13 11:26:31,376 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 11:26:31,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2141330.0, ans=0.125 2024-08-13 11:26:31,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2141330.0, ans=0.1 2024-08-13 11:26:32,014 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2024-08-13 11:26:39,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.420e+01 2.628e+01 2.915e+01 3.904e+01, threshold=5.256e+01, percent-clipped=0.0 2024-08-13 11:26:39,340 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-13 11:26:43,773 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11250, loss[loss=0.09079, beats_loss=0.01219, ecapa_loss=0.0001824, whisper_loss=0.07678, over 21460.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0108, ecapa_loss=0.0001645, whisper_loss=0.09161, over 3912224.42 frames. ], batch size: 90, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:27:04,819 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 30 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-13 11:27:09,647 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 11:27:15,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2141630.0, ans=0.1 2024-08-13 11:27:40,719 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.011e+01 2024-08-13 11:27:43,573 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 11:27:45,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2141830.0, ans=0.125 2024-08-13 11:27:57,786 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11300, loss[loss=0.07948, beats_loss=0.0128, ecapa_loss=0.0001674, whisper_loss=0.06501, over 15571.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01082, ecapa_loss=0.000165, whisper_loss=0.09113, over 3901859.34 frames. ], batch size: 68, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:27:59,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2141930.0, ans=0.0 2024-08-13 11:28:08,930 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.34 vs. limit=22.5 2024-08-13 11:28:15,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2142030.0, ans=0.2 2024-08-13 11:28:16,898 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 11:28:18,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2142030.0, ans=0.125 2024-08-13 11:28:25,278 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 11:28:31,821 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 11:28:33,004 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 11:28:33,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2142130.0, ans=0.1 2024-08-13 11:28:43,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2142230.0, ans=0.05 2024-08-13 11:28:50,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2142230.0, ans=0.125 2024-08-13 11:28:51,410 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.11 vs. limit=10.0 2024-08-13 11:29:07,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.518e+01 2.742e+01 3.086e+01 4.928e+01, threshold=5.483e+01, percent-clipped=0.0 2024-08-13 11:29:11,382 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11350, loss[loss=0.1119, beats_loss=0.008276, ecapa_loss=0.0001435, whisper_loss=0.1022, over 14558.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01078, ecapa_loss=0.0001656, whisper_loss=0.09105, over 3895243.98 frames. ], batch size: 55, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:29:21,679 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 28 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 11:29:34,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2142530.0, ans=0.1 2024-08-13 11:29:51,564 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 11:29:52,884 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-13 11:29:58,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2142730.0, ans=0.125 2024-08-13 11:30:08,739 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:30:17,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2142830.0, ans=10.0 2024-08-13 11:30:20,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2142830.0, ans=0.125 2024-08-13 11:30:25,346 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11400, loss[loss=0.1146, beats_loss=0.009472, ecapa_loss=0.0001631, whisper_loss=0.1035, over 19752.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01078, ecapa_loss=0.0001654, whisper_loss=0.0913, over 3906701.79 frames. ], batch size: 79, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:30:49,226 INFO [train_multi_KD3.py:844] (1/4) A total of 95 cuts. 29 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 11:30:50,589 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 10 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 11:30:53,745 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 11:31:29,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2143330.0, ans=0.125 2024-08-13 11:31:37,007 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.547e+01 2.847e+01 3.262e+01 4.632e+01, threshold=5.695e+01, percent-clipped=0.0 2024-08-13 11:31:42,364 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11450, loss[loss=0.1317, beats_loss=0.007724, ecapa_loss=0.0002248, whisper_loss=0.1217, over 21984.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.0001651, whisper_loss=0.09127, over 3932083.80 frames. ], batch size: 93, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:31:51,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2143430.0, ans=0.125 2024-08-13 11:32:01,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=2143530.0, ans=0.2 2024-08-13 11:32:16,177 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 30 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-13 11:32:28,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2143730.0, ans=0.125 2024-08-13 11:32:30,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2143730.0, ans=0.09899494936611666 2024-08-13 11:32:40,012 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 18 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 11:32:41,445 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 15 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 11:32:41,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2143730.0, ans=0.0 2024-08-13 11:32:43,044 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 11:32:58,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2143930.0, ans=0.125 2024-08-13 11:33:00,297 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11500, loss[loss=0.1169, beats_loss=0.008888, ecapa_loss=0.0001577, whisper_loss=0.1065, over 20234.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001664, whisper_loss=0.09131, over 3916349.52 frames. ], batch size: 79, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:33:10,344 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 11:33:19,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2144030.0, ans=0.125 2024-08-13 11:33:20,931 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-13 11:33:25,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2144030.0, ans=0.0 2024-08-13 11:33:31,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2144130.0, ans=0.125 2024-08-13 11:33:40,228 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-13 11:33:45,317 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 11:33:46,460 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-13 11:33:51,177 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-13 11:34:07,241 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-13 11:34:10,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.467e+01 2.720e+01 3.175e+01 4.456e+01, threshold=5.439e+01, percent-clipped=0.0 2024-08-13 11:34:14,706 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11550, loss[loss=0.09086, beats_loss=0.01162, ecapa_loss=0.0001512, whisper_loss=0.07773, over 17036.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0108, ecapa_loss=0.0001656, whisper_loss=0.09056, over 3908542.43 frames. ], batch size: 68, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:34:25,922 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=12.0 2024-08-13 11:34:33,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2144530.0, ans=0.125 2024-08-13 11:34:39,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2144530.0, ans=0.09899494936611666 2024-08-13 11:34:40,864 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.025e-01 2024-08-13 11:34:45,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2144630.0, ans=0.125 2024-08-13 11:34:45,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2144630.0, ans=0.125 2024-08-13 11:34:58,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2144730.0, ans=0.015 2024-08-13 11:35:03,010 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 25 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 11:35:11,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2144730.0, ans=0.0 2024-08-13 11:35:13,977 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 11:35:18,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2144830.0, ans=0.125 2024-08-13 11:35:24,947 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 11:35:29,248 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11600, loss[loss=0.1114, beats_loss=0.01004, ecapa_loss=0.0001346, whisper_loss=0.1001, over 20904.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0108, ecapa_loss=0.0001655, whisper_loss=0.09069, over 3937091.55 frames. ], batch size: 78, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:35:41,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2144930.0, ans=0.1 2024-08-13 11:35:58,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2145130.0, ans=0.125 2024-08-13 11:36:00,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2145130.0, ans=0.0 2024-08-13 11:36:03,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2145130.0, ans=0.2 2024-08-13 11:36:08,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2145130.0, ans=0.0 2024-08-13 11:36:22,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2145230.0, ans=0.0 2024-08-13 11:36:23,474 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 39 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-13 11:36:37,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.430e+01 2.771e+01 3.076e+01 5.105e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-13 11:36:41,856 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11650, loss[loss=0.09072, beats_loss=0.01119, ecapa_loss=0.0001538, whisper_loss=0.07799, over 16073.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0108, ecapa_loss=0.0001643, whisper_loss=0.09135, over 3940616.26 frames. ], batch size: 61, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:36:42,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2145430.0, ans=0.125 2024-08-13 11:36:58,332 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-08-13 11:37:04,605 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=15.0 2024-08-13 11:37:14,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.39 vs. limit=15.0 2024-08-13 11:37:36,237 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 38 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 11:37:36,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2145730.0, ans=0.0 2024-08-13 11:37:57,297 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11700, loss[loss=0.1141, beats_loss=0.008472, ecapa_loss=0.0002132, whisper_loss=0.1035, over 18648.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01084, ecapa_loss=0.0001644, whisper_loss=0.09213, over 3956984.88 frames. ], batch size: 78, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:38:01,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2145930.0, ans=0.125 2024-08-13 11:38:07,611 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:38:18,284 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 30 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-13 11:38:19,574 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 11:38:23,093 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-08-13 11:38:44,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2146230.0, ans=0.0 2024-08-13 11:39:07,586 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.516e+01 2.793e+01 3.243e+01 6.496e+01, threshold=5.587e+01, percent-clipped=2.0 2024-08-13 11:39:11,882 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11750, loss[loss=0.08031, beats_loss=0.01366, ecapa_loss=0.0001414, whisper_loss=0.06523, over 17544.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01089, ecapa_loss=0.0001638, whisper_loss=0.09214, over 3952585.33 frames. ], batch size: 73, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:39:15,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2146430.0, ans=0.0 2024-08-13 11:39:25,107 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 11:39:48,773 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 11:39:57,389 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2024-08-13 11:40:05,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2146730.0, ans=10.0 2024-08-13 11:40:16,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2146830.0, ans=0.2 2024-08-13 11:40:23,412 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11800, loss[loss=0.1378, beats_loss=0.009468, ecapa_loss=0.0001736, whisper_loss=0.1266, over 21465.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01091, ecapa_loss=0.0001636, whisper_loss=0.09189, over 3928500.60 frames. ], batch size: 88, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:40:46,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2147030.0, ans=0.0 2024-08-13 11:40:50,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2147130.0, ans=0.1 2024-08-13 11:40:55,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2147130.0, ans=0.125 2024-08-13 11:41:04,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2147230.0, ans=0.125 2024-08-13 11:41:08,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2147230.0, ans=0.125 2024-08-13 11:41:25,827 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 23 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-13 11:41:29,485 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.422e+01 2.679e+01 2.998e+01 8.058e+01, threshold=5.358e+01, percent-clipped=1.0 2024-08-13 11:41:33,446 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11850, loss[loss=0.1122, beats_loss=0.009892, ecapa_loss=0.0001789, whisper_loss=0.1005, over 22525.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01093, ecapa_loss=0.0001632, whisper_loss=0.09152, over 3906663.95 frames. ], batch size: 90, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:41:50,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2147530.0, ans=0.0 2024-08-13 11:41:54,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2147530.0, ans=0.0 2024-08-13 11:42:16,704 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 11:42:18,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2147730.0, ans=0.2 2024-08-13 11:42:21,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2147730.0, ans=0.0 2024-08-13 11:42:22,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2147730.0, ans=0.125 2024-08-13 11:42:27,723 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-13 11:42:33,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2147830.0, ans=0.125 2024-08-13 11:42:35,637 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.93 vs. limit=22.5 2024-08-13 11:42:39,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2147830.0, ans=0.0 2024-08-13 11:42:42,676 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11900, loss[loss=0.1038, beats_loss=0.009513, ecapa_loss=0.0001631, whisper_loss=0.09269, over 22336.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01092, ecapa_loss=0.0001635, whisper_loss=0.09148, over 3905238.41 frames. ], batch size: 90, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:42:51,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2147930.0, ans=0.0 2024-08-13 11:43:11,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2148130.0, ans=0.0 2024-08-13 11:43:18,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=12.0 2024-08-13 11:43:18,299 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.82 vs. limit=15.0 2024-08-13 11:43:36,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2148330.0, ans=0.0 2024-08-13 11:43:37,289 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 11:43:47,387 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.341e+01 2.622e+01 2.921e+01 5.658e+01, threshold=5.245e+01, percent-clipped=1.0 2024-08-13 11:43:51,911 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 11950, loss[loss=0.1418, beats_loss=0.008758, ecapa_loss=0.0001679, whisper_loss=0.1314, over 16053.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01083, ecapa_loss=0.000164, whisper_loss=0.09182, over 3877009.34 frames. ], batch size: 61, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:43:58,490 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:44:04,826 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 26 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 11:44:10,110 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-13 11:44:13,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.09 vs. limit=22.5 2024-08-13 11:44:16,211 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=12.0 2024-08-13 11:44:19,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2148630.0, ans=0.09899494936611666 2024-08-13 11:44:21,048 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.647e+01 2024-08-13 11:44:28,639 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 11:44:43,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2148830.0, ans=0.125 2024-08-13 11:44:49,967 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.15 vs. limit=22.5 2024-08-13 11:44:56,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2148930.0, ans=0.125 2024-08-13 11:44:57,276 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12000, loss[loss=0.1108, beats_loss=0.01138, ecapa_loss=0.0001526, whisper_loss=0.0979, over 21801.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01079, ecapa_loss=0.000164, whisper_loss=0.09174, over 3858567.55 frames. ], batch size: 88, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:44:57,277 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 11:45:36,636 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005616, whisper_loss=0.2486, over 922467.00 frames. 2024-08-13 11:45:55,698 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on SV_voxceleb1: loss=0.004517, beats_loss=0, ecapa_loss=0.0004517, whisper_loss=0, over 939242.00 frames. 2024-08-13 11:47:56,497 INFO [train_multi_KD3.py:1149] (1/4) Epoch 15, validation on AT_audioset: loss=0.0239, beats_loss=0.0239, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 11:47:56,501 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-13 11:47:59,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-08-13 11:48:08,773 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 16 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 11:48:13,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2149030.0, ans=0.125 2024-08-13 11:48:14,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-13 11:48:20,756 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 11:48:21,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2149030.0, ans=0.125 2024-08-13 11:48:34,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2149130.0, ans=0.125 2024-08-13 11:48:53,885 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 11:48:59,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.423e+01 2.671e+01 3.267e+01 7.662e+01, threshold=5.342e+01, percent-clipped=3.0 2024-08-13 11:49:02,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2149430.0, ans=0.09899494936611666 2024-08-13 11:49:03,400 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12050, loss[loss=0.09747, beats_loss=0.01156, ecapa_loss=0.0001399, whisper_loss=0.08452, over 23322.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01083, ecapa_loss=0.0001637, whisper_loss=0.09134, over 3872194.39 frames. ], batch size: 93, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:49:20,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2149530.0, ans=0.125 2024-08-13 11:49:31,859 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-13 11:49:39,702 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 11:49:50,031 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 22 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 11:49:50,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2149730.0, ans=0.1 2024-08-13 11:49:53,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2149830.0, ans=0.125 2024-08-13 11:50:07,982 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12100, loss[loss=0.1003, beats_loss=0.01075, ecapa_loss=0.0001603, whisper_loss=0.08792, over 22233.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01083, ecapa_loss=0.0001647, whisper_loss=0.09117, over 3861716.99 frames. ], batch size: 90, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:50:08,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2149930.0, ans=0.0 2024-08-13 11:50:09,886 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-08-13 11:50:12,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2149930.0, ans=0.0 2024-08-13 11:50:21,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2150030.0, ans=0.125 2024-08-13 11:50:25,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2150030.0, ans=0.125 2024-08-13 11:50:48,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2150230.0, ans=0.0 2024-08-13 11:50:49,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2150230.0, ans=0.07 2024-08-13 11:50:59,283 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=15.0 2024-08-13 11:51:01,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2150330.0, ans=0.07 2024-08-13 11:51:05,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2150330.0, ans=0.1 2024-08-13 11:51:08,771 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.393e+01 2.671e+01 2.986e+01 4.532e+01, threshold=5.343e+01, percent-clipped=0.0 2024-08-13 11:51:11,538 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 11:51:12,759 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12150, loss[loss=0.1027, beats_loss=0.01197, ecapa_loss=0.0001483, whisper_loss=0.08922, over 21312.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01083, ecapa_loss=0.0001644, whisper_loss=0.09154, over 3886431.39 frames. ], batch size: 88, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:51:13,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2150430.0, ans=0.0 2024-08-13 11:51:15,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2150430.0, ans=0.05 2024-08-13 11:51:22,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2150430.0, ans=0.125 2024-08-13 11:51:25,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2150530.0, ans=0.125 2024-08-13 11:51:32,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2150530.0, ans=0.5 2024-08-13 11:51:37,025 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 11:51:54,448 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 11:52:03,831 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 11:52:19,361 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12200, loss[loss=0.1052, beats_loss=0.01026, ecapa_loss=0.0001411, whisper_loss=0.09356, over 18455.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.0001638, whisper_loss=0.0915, over 3899140.19 frames. ], batch size: 72, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:52:23,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2150930.0, ans=0.1 2024-08-13 11:52:24,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2150930.0, ans=0.125 2024-08-13 11:52:26,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2150930.0, ans=0.0 2024-08-13 11:52:30,628 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-08-13 11:52:44,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2151130.0, ans=0.125 2024-08-13 11:52:49,916 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 26 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-13 11:53:02,651 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 11:53:04,247 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:53:13,390 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 11:53:21,090 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.490e+01 2.780e+01 3.147e+01 4.927e+01, threshold=5.560e+01, percent-clipped=0.0 2024-08-13 11:53:24,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2151430.0, ans=0.0 2024-08-13 11:53:25,087 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12250, loss[loss=0.1038, beats_loss=0.01157, ecapa_loss=0.0001553, whisper_loss=0.09063, over 20768.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01082, ecapa_loss=0.000164, whisper_loss=0.0921, over 3902130.39 frames. ], batch size: 84, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:53:30,534 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 11:53:55,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2151630.0, ans=10.0 2024-08-13 11:53:56,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2151630.0, ans=0.125 2024-08-13 11:54:02,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=15.0 2024-08-13 11:54:16,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2151830.0, ans=0.1 2024-08-13 11:54:30,915 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12300, loss[loss=0.09238, beats_loss=0.01048, ecapa_loss=0.0001556, whisper_loss=0.08035, over 14931.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01077, ecapa_loss=0.0001654, whisper_loss=0.09235, over 3901833.69 frames. ], batch size: 58, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:54:31,105 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 20 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 11:54:43,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2152030.0, ans=0.0 2024-08-13 11:54:47,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2152030.0, ans=0.125 2024-08-13 11:55:08,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2152130.0, ans=0.0 2024-08-13 11:55:24,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2152330.0, ans=0.1 2024-08-13 11:55:32,374 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.395e+01 2.675e+01 2.989e+01 4.697e+01, threshold=5.351e+01, percent-clipped=0.0 2024-08-13 11:55:33,880 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 11:55:36,413 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12350, loss[loss=0.08611, beats_loss=0.01139, ecapa_loss=0.0001612, whisper_loss=0.07311, over 14801.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01078, ecapa_loss=0.0001663, whisper_loss=0.09221, over 3880920.43 frames. ], batch size: 59, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:55:38,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2152430.0, ans=0.125 2024-08-13 11:55:44,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2152430.0, ans=0.0 2024-08-13 11:55:45,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2152430.0, ans=0.125 2024-08-13 11:55:48,868 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=12.0 2024-08-13 11:55:49,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2152530.0, ans=0.0 2024-08-13 11:55:52,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2152530.0, ans=0.125 2024-08-13 11:55:54,111 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.66 vs. limit=15.0 2024-08-13 11:56:11,905 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2024-08-13 11:56:15,451 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:56:19,278 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 9 from Vox, 35 fro AS 2024-08-13 11:56:20,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2152730.0, ans=0.5 2024-08-13 11:56:23,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2152730.0, ans=0.125 2024-08-13 11:56:40,992 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12400, loss[loss=0.08744, beats_loss=0.01148, ecapa_loss=0.0001902, whisper_loss=0.07406, over 19479.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01079, ecapa_loss=0.0001643, whisper_loss=0.09226, over 3912905.29 frames. ], batch size: 84, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:56:45,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=12.0 2024-08-13 11:57:04,618 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 11:57:11,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2153130.0, ans=0.1 2024-08-13 11:57:24,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2153230.0, ans=0.2 2024-08-13 11:57:26,508 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-13 11:57:27,284 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 11:57:31,281 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 11:57:34,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2153330.0, ans=0.125 2024-08-13 11:57:43,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.413e+01 2.568e+01 2.884e+01 5.690e+01, threshold=5.135e+01, percent-clipped=1.0 2024-08-13 11:57:43,205 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 17 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 11:57:47,369 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12450, loss[loss=0.1133, beats_loss=0.01095, ecapa_loss=0.0001401, whisper_loss=0.101, over 23715.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01087, ecapa_loss=0.0001643, whisper_loss=0.09112, over 3895034.60 frames. ], batch size: 92, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:58:11,117 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 11:58:19,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2153630.0, ans=0.0 2024-08-13 11:58:50,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2153830.0, ans=0.0 2024-08-13 11:58:53,014 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12500, loss[loss=0.111, beats_loss=0.008969, ecapa_loss=0.0002081, whisper_loss=0.09996, over 21883.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01087, ecapa_loss=0.0001643, whisper_loss=0.09151, over 3902109.93 frames. ], batch size: 91, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:58:53,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2153930.0, ans=0.125 2024-08-13 11:58:55,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2153930.0, ans=0.1 2024-08-13 11:59:01,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2153930.0, ans=0.1 2024-08-13 11:59:10,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2154030.0, ans=0.07 2024-08-13 11:59:15,054 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.10 vs. limit=22.5 2024-08-13 11:59:18,101 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 22 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 11:59:48,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2154330.0, ans=0.2 2024-08-13 11:59:54,668 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.421e+01 2.658e+01 2.977e+01 4.803e+01, threshold=5.316e+01, percent-clipped=0.0 2024-08-13 11:59:57,364 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 29 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-13 11:59:58,488 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12550, loss[loss=0.1057, beats_loss=0.009601, ecapa_loss=0.0002268, whisper_loss=0.09379, over 21861.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01083, ecapa_loss=0.0001643, whisper_loss=0.09174, over 3918717.78 frames. ], batch size: 93, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:00:00,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2154430.0, ans=0.125 2024-08-13 12:00:19,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2154530.0, ans=0.1 2024-08-13 12:00:29,198 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 18 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 12:00:30,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2154630.0, ans=0.04949747468305833 2024-08-13 12:01:00,985 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 21 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-13 12:01:04,864 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12600, loss[loss=0.1175, beats_loss=0.008711, ecapa_loss=0.0001902, whisper_loss=0.1069, over 13880.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01074, ecapa_loss=0.0001657, whisper_loss=0.09283, over 3909789.89 frames. ], batch size: 55, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:01:05,038 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 12:01:10,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2154930.0, ans=0.05 2024-08-13 12:01:11,691 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 12:01:15,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2154930.0, ans=0.125 2024-08-13 12:01:32,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2155130.0, ans=0.07 2024-08-13 12:01:32,726 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=12.0 2024-08-13 12:01:40,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2155130.0, ans=0.0 2024-08-13 12:01:48,193 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 12:01:49,724 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 12:02:06,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.360e+01 2.643e+01 2.873e+01 1.126e+02, threshold=5.286e+01, percent-clipped=2.0 2024-08-13 12:02:07,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2155330.0, ans=0.0 2024-08-13 12:02:10,474 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12650, loss[loss=0.08878, beats_loss=0.01277, ecapa_loss=0.0001991, whisper_loss=0.07402, over 15637.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01079, ecapa_loss=0.000167, whisper_loss=0.09236, over 3892831.78 frames. ], batch size: 66, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:02:13,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2155430.0, ans=0.05 2024-08-13 12:02:17,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2155430.0, ans=0.125 2024-08-13 12:02:17,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2155430.0, ans=0.2 2024-08-13 12:02:19,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2155430.0, ans=0.0 2024-08-13 12:02:29,091 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 17 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-13 12:02:34,261 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-13 12:02:34,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2155530.0, ans=0.2 2024-08-13 12:02:35,481 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 12:02:42,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2155630.0, ans=0.125 2024-08-13 12:02:46,013 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 12:02:49,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2155730.0, ans=0.125 2024-08-13 12:02:51,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2155730.0, ans=0.1 2024-08-13 12:02:55,984 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 12:03:08,812 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2024-08-13 12:03:14,935 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12700, loss[loss=0.09503, beats_loss=0.01112, ecapa_loss=0.0001593, whisper_loss=0.08231, over 19321.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.0001664, whisper_loss=0.09163, over 3856705.04 frames. ], batch size: 79, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:03:20,329 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 36 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 12:03:34,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2024-08-13 12:03:38,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2156030.0, ans=0.125 2024-08-13 12:03:41,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2156130.0, ans=0.125 2024-08-13 12:03:45,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2156130.0, ans=0.1 2024-08-13 12:03:49,000 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 28 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-13 12:04:03,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2156230.0, ans=0.1 2024-08-13 12:04:04,717 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 12:04:09,860 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 21 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-13 12:04:17,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.422e+01 2.694e+01 3.051e+01 5.714e+01, threshold=5.388e+01, percent-clipped=1.0 2024-08-13 12:04:19,930 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12750, loss[loss=0.1037, beats_loss=0.01259, ecapa_loss=0.0001534, whisper_loss=0.08955, over 16622.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01087, ecapa_loss=0.0001668, whisper_loss=0.09151, over 3863634.48 frames. ], batch size: 66, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:04:20,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2156430.0, ans=0.0 2024-08-13 12:04:20,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=12.0 2024-08-13 12:04:22,481 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-13 12:04:25,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2156430.0, ans=0.1 2024-08-13 12:04:34,501 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.886e-01 2024-08-13 12:04:46,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2156630.0, ans=0.2 2024-08-13 12:04:50,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2156630.0, ans=0.0 2024-08-13 12:04:53,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2156630.0, ans=0.0 2024-08-13 12:05:03,872 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 12:05:06,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2156730.0, ans=0.125 2024-08-13 12:05:21,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2156830.0, ans=0.125 2024-08-13 12:05:27,116 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12800, loss[loss=0.0796, beats_loss=0.01538, ecapa_loss=0.0001402, whisper_loss=0.06282, over 20693.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01098, ecapa_loss=0.0001669, whisper_loss=0.0904, over 3849912.39 frames. ], batch size: 87, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:05:28,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2156930.0, ans=0.125 2024-08-13 12:05:34,343 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-13 12:05:39,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2157030.0, ans=0.07 2024-08-13 12:05:41,884 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 12:05:43,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2157030.0, ans=0.125 2024-08-13 12:06:04,561 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 21 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 12:06:26,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.27 vs. limit=12.0 2024-08-13 12:06:29,908 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 17 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 12:06:34,413 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.329e+01 2.579e+01 3.123e+01 7.384e+01, threshold=5.159e+01, percent-clipped=1.0 2024-08-13 12:06:35,421 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=12.0 2024-08-13 12:06:37,257 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12850, loss[loss=0.09769, beats_loss=0.01103, ecapa_loss=0.0001972, whisper_loss=0.08468, over 18879.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01099, ecapa_loss=0.0001675, whisper_loss=0.0903, over 3844277.05 frames. ], batch size: 76, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:06:42,895 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 12:06:46,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2157430.0, ans=0.0 2024-08-13 12:06:49,333 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.48 vs. limit=15.0 2024-08-13 12:07:12,800 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-13 12:07:14,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2157630.0, ans=0.2 2024-08-13 12:07:19,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2157730.0, ans=0.0 2024-08-13 12:07:21,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2157730.0, ans=0.015 2024-08-13 12:07:49,836 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12900, loss[loss=0.09311, beats_loss=0.01311, ecapa_loss=0.0001962, whisper_loss=0.07803, over 18594.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01097, ecapa_loss=0.0001669, whisper_loss=0.08982, over 3832710.42 frames. ], batch size: 80, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:07:52,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2157930.0, ans=0.0 2024-08-13 12:07:53,636 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2024-08-13 12:08:10,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2158030.0, ans=0.125 2024-08-13 12:08:33,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2158230.0, ans=0.0 2024-08-13 12:08:47,821 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.73 vs. limit=15.0 2024-08-13 12:09:01,807 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.397e+01 2.771e+01 3.216e+01 4.644e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-13 12:09:05,071 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 12950, loss[loss=0.1012, beats_loss=0.009247, ecapa_loss=0.0001908, whisper_loss=0.09007, over 13510.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01098, ecapa_loss=0.0001657, whisper_loss=0.08959, over 3835509.06 frames. ], batch size: 55, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:09:08,096 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 12:09:10,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2158430.0, ans=0.125 2024-08-13 12:09:10,922 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-13 12:09:14,837 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 12:09:22,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2158530.0, ans=0.1 2024-08-13 12:09:25,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2158530.0, ans=0.035 2024-08-13 12:09:27,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2158530.0, ans=0.04949747468305833 2024-08-13 12:09:52,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2158730.0, ans=0.0 2024-08-13 12:10:18,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2158930.0, ans=0.125 2024-08-13 12:10:19,336 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13000, loss[loss=0.1041, beats_loss=0.01116, ecapa_loss=0.0001421, whisper_loss=0.09156, over 14383.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01091, ecapa_loss=0.000166, whisper_loss=0.09034, over 3862851.17 frames. ], batch size: 56, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:10:19,571 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 26 from LS+wenet, 12 from Vox, 48 fro AS 2024-08-13 12:10:22,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2158930.0, ans=0.1 2024-08-13 12:10:27,653 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-13 12:10:27,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2158930.0, ans=0.0 2024-08-13 12:10:40,424 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 22 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-13 12:10:44,405 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 12:10:58,146 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=12.0 2024-08-13 12:11:05,865 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2024-08-13 12:11:09,843 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 25 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 12:11:10,055 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:11:14,960 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:11:29,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2159330.0, ans=0.125 2024-08-13 12:11:30,601 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.395e+01 2.755e+01 3.311e+01 7.767e+01, threshold=5.510e+01, percent-clipped=1.0 2024-08-13 12:11:33,544 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13050, loss[loss=0.1132, beats_loss=0.009004, ecapa_loss=0.0001914, whisper_loss=0.1023, over 21022.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01092, ecapa_loss=0.0001654, whisper_loss=0.09056, over 3871208.57 frames. ], batch size: 85, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:11:37,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2159430.0, ans=0.2 2024-08-13 12:11:51,224 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 12:12:02,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2159530.0, ans=0.0 2024-08-13 12:12:06,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2159630.0, ans=0.125 2024-08-13 12:12:08,354 INFO [train_multi_KD3.py:844] (1/4) A total of 82 cuts. 29 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 12:12:18,757 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 12:12:24,533 INFO [train_multi_KD3.py:844] (1/4) A total of 61 cuts. 17 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-13 12:12:25,985 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 12:12:43,987 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 33 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-13 12:12:46,451 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.39 vs. limit=12.0 2024-08-13 12:12:50,427 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13100, loss[loss=0.09777, beats_loss=0.01302, ecapa_loss=0.0001549, whisper_loss=0.0832, over 22864.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01094, ecapa_loss=0.0001641, whisper_loss=0.09011, over 3880349.10 frames. ], batch size: 93, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:12:57,059 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 12:13:11,453 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 12:13:21,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2160030.0, ans=0.025 2024-08-13 12:13:42,707 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-08-13 12:13:46,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2160230.0, ans=0.125 2024-08-13 12:13:46,772 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-08-13 12:13:46,998 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=15.0 2024-08-13 12:13:48,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2160230.0, ans=0.0 2024-08-13 12:14:06,733 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 27 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 12:14:09,720 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.515e+01 2.737e+01 3.188e+01 6.948e+01, threshold=5.474e+01, percent-clipped=1.0 2024-08-13 12:14:12,774 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13150, loss[loss=0.09182, beats_loss=0.01358, ecapa_loss=0.0001421, whisper_loss=0.07683, over 23562.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01092, ecapa_loss=0.0001636, whisper_loss=0.09064, over 3905761.25 frames. ], batch size: 92, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:14:16,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2160430.0, ans=0.07 2024-08-13 12:14:29,666 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 27 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 12:14:48,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2160630.0, ans=0.1 2024-08-13 12:14:50,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2160630.0, ans=0.1 2024-08-13 12:14:53,066 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 12:15:17,589 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2024-08-13 12:15:18,368 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 19 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 12:15:19,751 INFO [train_multi_KD3.py:844] (1/4) A total of 65 cuts. 14 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 12:15:32,580 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13200, loss[loss=0.09748, beats_loss=0.01068, ecapa_loss=0.0001671, whisper_loss=0.08513, over 22126.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01081, ecapa_loss=0.0001656, whisper_loss=0.09121, over 3905871.37 frames. ], batch size: 95, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:15:33,470 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2024-08-13 12:15:39,890 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.66 vs. limit=22.5 2024-08-13 12:16:04,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2161130.0, ans=0.125 2024-08-13 12:16:05,206 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2024-08-13 12:16:17,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2024-08-13 12:16:19,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2161230.0, ans=0.0 2024-08-13 12:16:28,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2161230.0, ans=0.125 2024-08-13 12:16:41,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2161330.0, ans=0.2 2024-08-13 12:16:44,872 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 37 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 12:16:51,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.301e+01 2.587e+01 2.900e+01 9.399e+01, threshold=5.174e+01, percent-clipped=1.0 2024-08-13 12:16:54,505 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13250, loss[loss=0.104, beats_loss=0.008789, ecapa_loss=0.0001895, whisper_loss=0.09336, over 23042.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01081, ecapa_loss=0.0001644, whisper_loss=0.09111, over 3932387.35 frames. ], batch size: 92, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:17:02,511 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-13 12:17:10,978 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=12.0 2024-08-13 12:17:17,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2161530.0, ans=0.125 2024-08-13 12:17:23,575 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=12.0 2024-08-13 12:17:47,045 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2024-08-13 12:18:04,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2161830.0, ans=0.1 2024-08-13 12:18:06,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2161830.0, ans=0.05 2024-08-13 12:18:10,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2161830.0, ans=0.125 2024-08-13 12:18:12,221 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13300, loss[loss=0.0848, beats_loss=0.01257, ecapa_loss=0.0001576, whisper_loss=0.07065, over 19651.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0108, ecapa_loss=0.0001648, whisper_loss=0.09122, over 3899143.30 frames. ], batch size: 80, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:18:12,414 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 12:18:16,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2161930.0, ans=0.0 2024-08-13 12:18:19,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2161930.0, ans=0.0 2024-08-13 12:18:29,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=2162030.0, ans=0.2 2024-08-13 12:18:32,005 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-08-13 12:18:36,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2162030.0, ans=0.125 2024-08-13 12:18:36,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2162030.0, ans=0.125 2024-08-13 12:18:49,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2162130.0, ans=0.125 2024-08-13 12:18:59,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2162230.0, ans=0.125 2024-08-13 12:19:03,109 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 12:19:03,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2162230.0, ans=0.0 2024-08-13 12:19:07,229 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 26 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 12:19:29,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.345e+01 2.598e+01 2.972e+01 4.210e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-13 12:19:33,258 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13350, loss[loss=0.1026, beats_loss=0.01104, ecapa_loss=0.0001349, whisper_loss=0.09025, over 23186.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01072, ecapa_loss=0.0001657, whisper_loss=0.09182, over 3913494.64 frames. ], batch size: 90, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:19:44,530 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.53 vs. limit=12.0 2024-08-13 12:19:59,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2162530.0, ans=0.0 2024-08-13 12:20:00,523 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 16 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 12:20:04,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2162630.0, ans=0.0 2024-08-13 12:20:04,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2162630.0, ans=0.2 2024-08-13 12:20:06,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2162630.0, ans=15.0 2024-08-13 12:20:23,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2162730.0, ans=0.0 2024-08-13 12:20:28,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2162730.0, ans=0.125 2024-08-13 12:20:32,838 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 14 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 12:20:50,005 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13400, loss[loss=0.1107, beats_loss=0.01051, ecapa_loss=0.0001476, whisper_loss=0.09874, over 23782.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01076, ecapa_loss=0.0001659, whisper_loss=0.09167, over 3900758.79 frames. ], batch size: 91, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:20:56,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2162930.0, ans=0.125 2024-08-13 12:21:07,904 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 12:21:08,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2163030.0, ans=0.2 2024-08-13 12:21:17,428 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 26 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 12:21:31,668 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 23 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 12:21:36,436 INFO [train_multi_KD3.py:844] (1/4) A total of 74 cuts. 18 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 12:21:38,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2163230.0, ans=0.0 2024-08-13 12:21:46,219 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 12:22:03,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2163330.0, ans=0.125 2024-08-13 12:22:06,271 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.423e+01 2.749e+01 3.162e+01 4.773e+01, threshold=5.498e+01, percent-clipped=0.0 2024-08-13 12:22:06,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2163330.0, ans=0.125 2024-08-13 12:22:08,844 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13450, loss[loss=0.09592, beats_loss=0.01154, ecapa_loss=0.000186, whisper_loss=0.08252, over 21885.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01079, ecapa_loss=0.0001646, whisper_loss=0.09138, over 3929087.47 frames. ], batch size: 91, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:22:09,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2163430.0, ans=0.125 2024-08-13 12:22:17,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2163430.0, ans=0.125 2024-08-13 12:22:41,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2163630.0, ans=0.1 2024-08-13 12:22:42,308 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 12:22:47,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2163630.0, ans=0.0 2024-08-13 12:22:54,793 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2024-08-13 12:23:05,154 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 12:23:05,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2163730.0, ans=0.0 2024-08-13 12:23:19,158 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 12:23:24,167 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:23:26,664 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13500, loss[loss=0.1071, beats_loss=0.009406, ecapa_loss=0.0001711, whisper_loss=0.096, over 18982.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01072, ecapa_loss=0.0001648, whisper_loss=0.09195, over 3934251.23 frames. ], batch size: 75, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:23:28,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.17 vs. limit=22.5 2024-08-13 12:23:43,097 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 20 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 12:23:57,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2164130.0, ans=0.125 2024-08-13 12:24:05,590 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 12:24:07,267 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 34 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 12:24:18,138 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:24:21,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2164230.0, ans=0.035 2024-08-13 12:24:22,456 INFO [train_multi_KD3.py:844] (1/4) A total of 78 cuts. 18 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 12:24:23,898 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:24:36,006 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 12:24:36,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2164330.0, ans=0.05 2024-08-13 12:24:37,286 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 12:24:41,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.326e+01 2.605e+01 3.115e+01 6.571e+01, threshold=5.210e+01, percent-clipped=1.0 2024-08-13 12:24:43,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2164430.0, ans=0.125 2024-08-13 12:24:45,038 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13550, loss[loss=0.09009, beats_loss=0.01314, ecapa_loss=0.0001398, whisper_loss=0.07556, over 14500.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01078, ecapa_loss=0.000164, whisper_loss=0.09154, over 3901338.75 frames. ], batch size: 58, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:24:46,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2164430.0, ans=0.025 2024-08-13 12:25:02,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2164530.0, ans=0.07 2024-08-13 12:25:07,989 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 31 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 12:25:17,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2164630.0, ans=0.125 2024-08-13 12:25:56,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2164830.0, ans=0.125 2024-08-13 12:25:58,794 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 20 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 12:26:02,011 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13600, loss[loss=0.1113, beats_loss=0.01075, ecapa_loss=0.0001714, whisper_loss=0.09886, over 18014.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01087, ecapa_loss=0.0001625, whisper_loss=0.09184, over 3909005.81 frames. ], batch size: 73, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:26:04,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2164930.0, ans=0.1 2024-08-13 12:26:21,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2165030.0, ans=15.0 2024-08-13 12:26:43,086 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 18 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-13 12:26:46,473 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 12:26:53,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2165230.0, ans=0.0 2024-08-13 12:27:06,838 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2024-08-13 12:27:10,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2165330.0, ans=0.1 2024-08-13 12:27:17,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.537e+01 2.794e+01 3.122e+01 4.623e+01, threshold=5.587e+01, percent-clipped=0.0 2024-08-13 12:27:20,458 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13650, loss[loss=0.08959, beats_loss=0.01418, ecapa_loss=0.0001457, whisper_loss=0.07394, over 21984.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01091, ecapa_loss=0.0001633, whisper_loss=0.09184, over 3935996.06 frames. ], batch size: 93, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:27:31,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2165430.0, ans=0.125 2024-08-13 12:27:34,129 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 22 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-13 12:27:40,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2165530.0, ans=0.0 2024-08-13 12:28:02,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2165630.0, ans=0.2 2024-08-13 12:28:02,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2165630.0, ans=0.0 2024-08-13 12:28:14,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2165730.0, ans=0.04949747468305833 2024-08-13 12:28:31,429 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 22 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 12:28:38,040 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13700, loss[loss=0.1094, beats_loss=0.009754, ecapa_loss=0.0001658, whisper_loss=0.098, over 16615.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01096, ecapa_loss=0.0001629, whisper_loss=0.09094, over 3895364.39 frames. ], batch size: 65, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:28:51,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2166030.0, ans=0.125 2024-08-13 12:28:58,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2166030.0, ans=0.125 2024-08-13 12:29:05,598 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-13 12:29:05,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2166030.0, ans=0.125 2024-08-13 12:29:13,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2166130.0, ans=0.2 2024-08-13 12:29:22,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2166230.0, ans=0.0 2024-08-13 12:29:52,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.509e+01 2.844e+01 3.319e+01 7.223e+01, threshold=5.689e+01, percent-clipped=1.0 2024-08-13 12:29:55,322 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13750, loss[loss=0.0882, beats_loss=0.01167, ecapa_loss=0.0001549, whisper_loss=0.07498, over 22325.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01102, ecapa_loss=0.0001613, whisper_loss=0.09087, over 3915850.60 frames. ], batch size: 92, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:30:00,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2166430.0, ans=0.125 2024-08-13 12:30:09,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2166530.0, ans=0.0 2024-08-13 12:30:22,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2166530.0, ans=0.1 2024-08-13 12:30:23,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2166530.0, ans=0.0 2024-08-13 12:30:34,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2166630.0, ans=0.1 2024-08-13 12:30:58,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2166830.0, ans=0.0 2024-08-13 12:31:05,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2166830.0, ans=0.125 2024-08-13 12:31:05,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2166830.0, ans=0.05 2024-08-13 12:31:11,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2166930.0, ans=0.1 2024-08-13 12:31:12,660 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13800, loss[loss=0.1041, beats_loss=0.00846, ecapa_loss=0.0001862, whisper_loss=0.09375, over 17578.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01096, ecapa_loss=0.0001616, whisper_loss=0.09059, over 3885521.90 frames. ], batch size: 70, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:31:14,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2166930.0, ans=0.0 2024-08-13 12:31:25,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2166930.0, ans=0.125 2024-08-13 12:31:45,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2167130.0, ans=0.125 2024-08-13 12:31:54,035 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.68 vs. limit=10.0 2024-08-13 12:32:06,555 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 24 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-13 12:32:12,731 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 24 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-13 12:32:22,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2167330.0, ans=15.0 2024-08-13 12:32:26,652 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.343e+01 2.633e+01 2.825e+01 4.077e+01, threshold=5.266e+01, percent-clipped=0.0 2024-08-13 12:32:30,099 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13850, loss[loss=0.109, beats_loss=0.01118, ecapa_loss=0.0001771, whisper_loss=0.09601, over 22590.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01088, ecapa_loss=0.0001625, whisper_loss=0.09128, over 3871111.92 frames. ], batch size: 93, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:32:52,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2167530.0, ans=0.04949747468305833 2024-08-13 12:33:07,405 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 12:33:41,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2167830.0, ans=0.1 2024-08-13 12:33:47,692 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13900, loss[loss=0.1165, beats_loss=0.01196, ecapa_loss=0.0001411, whisper_loss=0.1032, over 23486.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0108, ecapa_loss=0.000163, whisper_loss=0.09219, over 3902650.70 frames. ], batch size: 94, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:33:49,542 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 12:34:27,086 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 25 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 12:34:27,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2168130.0, ans=0.0 2024-08-13 12:34:39,581 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 19 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-13 12:34:39,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2168230.0, ans=0.125 2024-08-13 12:34:40,906 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 12:34:46,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2168230.0, ans=0.0 2024-08-13 12:34:47,427 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 23 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-13 12:34:48,084 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-08-13 12:34:51,610 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 12:35:02,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.480e+01 2.802e+01 3.173e+01 5.254e+01, threshold=5.604e+01, percent-clipped=0.0 2024-08-13 12:35:05,020 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 13950, loss[loss=0.1009, beats_loss=0.009835, ecapa_loss=0.0001587, whisper_loss=0.0895, over 22143.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01075, ecapa_loss=0.0001619, whisper_loss=0.09224, over 3898878.32 frames. ], batch size: 91, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:35:12,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2168430.0, ans=0.125 2024-08-13 12:35:16,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2168430.0, ans=0.125 2024-08-13 12:35:45,783 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 35 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 12:35:47,170 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-13 12:36:22,584 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 12:36:27,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2168830.0, ans=0.1 2024-08-13 12:36:31,142 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 14000, loss[loss=0.1228, beats_loss=0.00902, ecapa_loss=0.0001646, whisper_loss=0.1122, over 23374.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0108, ecapa_loss=0.0001605, whisper_loss=0.09185, over 3883649.89 frames. ], batch size: 90, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:36:45,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2168930.0, ans=0.025 2024-08-13 12:36:47,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2169030.0, ans=0.0 2024-08-13 12:36:54,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2169030.0, ans=0.125 2024-08-13 12:37:07,277 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.42 vs. limit=10.0 2024-08-13 12:37:21,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=25.30 vs. limit=22.5 2024-08-13 12:37:37,650 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 25 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-13 12:37:49,987 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 12:37:56,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.610e+01 2.869e+01 3.326e+01 4.545e+01, threshold=5.739e+01, percent-clipped=0.0 2024-08-13 12:38:02,089 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 14050, loss[loss=0.1088, beats_loss=0.009734, ecapa_loss=0.0001614, whisper_loss=0.09746, over 20058.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01074, ecapa_loss=0.0001606, whisper_loss=0.09296, over 3880612.20 frames. ], batch size: 80, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:38:15,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2169430.0, ans=0.2 2024-08-13 12:38:17,913 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.668e+01 2024-08-13 12:38:27,585 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-13 12:38:31,918 INFO [train_multi_KD3.py:844] (1/4) A total of 62 cuts. 24 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-13 12:38:34,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2024-08-13 12:38:46,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2169530.0, ans=0.1 2024-08-13 12:39:35,602 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 17 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-13 12:39:39,563 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 12:39:41,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2169830.0, ans=0.125 2024-08-13 12:39:42,631 INFO [train_multi_KD3.py:844] (1/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 12:39:48,352 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 14100, loss[loss=0.1063, beats_loss=0.01007, ecapa_loss=0.0001828, whisper_loss=0.09436, over 17776.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01074, ecapa_loss=0.0001618, whisper_loss=0.09273, over 3866131.24 frames. ], batch size: 70, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:39:56,877 INFO [train_multi_KD3.py:844] (1/4) A total of 60 cuts. 26 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-13 12:40:22,109 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 27 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-13 12:40:28,611 INFO [train_multi_KD3.py:844] (1/4) A total of 56 cuts. 17 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-13 12:40:33,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2170130.0, ans=0.0 2024-08-13 12:40:50,504 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 12:40:50,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2170130.0, ans=10.0 2024-08-13 12:41:39,602 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.420e+01 2.685e+01 3.019e+01 4.436e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-13 12:41:45,786 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 14150, loss[loss=0.1221, beats_loss=0.009041, ecapa_loss=0.0001878, whisper_loss=0.1112, over 19185.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01073, ecapa_loss=0.0001622, whisper_loss=0.09306, over 3872859.18 frames. ], batch size: 77, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:41:49,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2170430.0, ans=0.0 2024-08-13 12:42:42,662 INFO [train_multi_KD3.py:844] (1/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 12:42:54,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2170730.0, ans=0.125 2024-08-13 12:43:46,021 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 14200, loss[loss=0.1019, beats_loss=0.01029, ecapa_loss=0.0001913, whisper_loss=0.08972, over 21258.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01069, ecapa_loss=0.0001629, whisper_loss=0.09311, over 3892421.30 frames. ], batch size: 92, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:44:45,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2171130.0, ans=0.125 2024-08-13 12:45:03,316 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 12:45:31,796 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:45:36,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2171330.0, ans=0.125 2024-08-13 12:45:43,937 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.405e+01 2.774e+01 3.077e+01 4.390e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-13 12:45:49,126 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 14250, loss[loss=0.1033, beats_loss=0.009302, ecapa_loss=0.0002145, whisper_loss=0.09184, over 17154.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01075, ecapa_loss=0.0001627, whisper_loss=0.09192, over 3878954.44 frames. ], batch size: 72, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:45:50,873 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 15 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 12:46:19,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2171530.0, ans=0.125 2024-08-13 12:46:29,590 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.31 vs. limit=15.0 2024-08-13 12:46:44,908 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 12:46:51,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2171730.0, ans=0.5 2024-08-13 12:46:59,416 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-13 12:47:01,208 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-13 12:47:02,483 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=12.0 2024-08-13 12:47:12,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2171930.0, ans=0.125 2024-08-13 12:47:13,357 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 14300, loss[loss=0.1019, beats_loss=0.01032, ecapa_loss=0.000143, whisper_loss=0.09012, over 16974.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.000163, whisper_loss=0.0913, over 3889047.75 frames. ], batch size: 67, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:47:34,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2172030.0, ans=0.125 2024-08-13 12:47:42,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2172030.0, ans=0.125 2024-08-13 12:47:46,361 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 19 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 12:47:49,969 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.03 vs. limit=22.5 2024-08-13 12:48:09,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2172230.0, ans=0.0 2024-08-13 12:48:29,038 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 12:48:30,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.493e+01 2.701e+01 3.105e+01 1.229e+02, threshold=5.402e+01, percent-clipped=5.0 2024-08-13 12:48:34,913 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 14350, loss[loss=0.05973, beats_loss=0.01207, ecapa_loss=0.0001461, whisper_loss=0.0462, over 15325.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01086, ecapa_loss=0.0001633, whisper_loss=0.09092, over 3885861.02 frames. ], batch size: 64, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:48:40,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2172430.0, ans=0.0 2024-08-13 12:48:40,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.19 vs. limit=12.0 2024-08-13 12:48:52,126 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 12:48:54,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2172530.0, ans=0.2 2024-08-13 12:48:54,259 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2024-08-13 12:48:55,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2172530.0, ans=0.2 2024-08-13 12:49:10,775 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-13 12:49:39,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2172830.0, ans=0.1 2024-08-13 12:49:39,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2172830.0, ans=0.125 2024-08-13 12:49:45,717 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-13 12:49:54,126 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 14400, loss[loss=0.1043, beats_loss=0.008984, ecapa_loss=0.0002192, whisper_loss=0.09317, over 17976.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01082, ecapa_loss=0.0001648, whisper_loss=0.09115, over 3866738.20 frames. ], batch size: 74, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:50:04,688 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=22.5 2024-08-13 12:50:09,612 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 12:50:15,292 INFO [train_multi_KD3.py:844] (1/4) A total of 71 cuts. 31 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 12:50:30,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2173130.0, ans=0.0 2024-08-13 12:50:31,998 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 12:50:32,786 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=12.0 2024-08-13 12:50:38,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2173130.0, ans=0.125 2024-08-13 12:50:41,504 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.50 vs. limit=15.0 2024-08-13 12:50:51,704 INFO [train_multi_KD3.py:844] (1/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 12:51:13,878 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.394e+01 2.605e+01 2.942e+01 4.760e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-13 12:51:17,137 INFO [train_multi_KD3.py:1116] (1/4) Epoch 15, batch 14450, loss[loss=0.1115, beats_loss=0.01099, ecapa_loss=0.000181, whisper_loss=0.09872, over 23054.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01082, ecapa_loss=0.0001644, whisper_loss=0.09165, over 3887298.14 frames. ], batch size: 96, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:51:23,521 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 12:51:29,694 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 15 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-13 12:51:48,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2173630.0, ans=0.0 2024-08-13 12:52:46,799 INFO [train_multi_KD3.py:844] (1/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 12:52:47,920 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 0, loss[loss=0.09328, beats_loss=0.01019, ecapa_loss=0.0001849, whisper_loss=0.08123, over 21861.00 frames. ], tot_loss[loss=0.09328, beats_loss=0.01019, ecapa_loss=0.0001849, whisper_loss=0.08123, over 21861.00 frames. ], batch size: 91, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:52:47,920 INFO [train_multi_KD3.py:1139] (1/4) Computing validation loss 2024-08-13 12:53:29,389 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005644, whisper_loss=0.2485, over 922467.00 frames. 2024-08-13 12:53:45,210 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on SV_voxceleb1: loss=0.00454, beats_loss=0, ecapa_loss=0.000454, whisper_loss=0, over 939242.00 frames. 2024-08-13 12:55:41,358 INFO [train_multi_KD3.py:1149] (1/4) Epoch 16, validation on AT_audioset: loss=0.02377, beats_loss=0.02377, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 12:55:41,361 INFO [train_multi_KD3.py:1155] (1/4) Maximum memory allocated so far is 31130MB 2024-08-13 12:55:41,518 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 12:55:58,132 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.864e-01 2024-08-13 12:56:12,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2173910.0, ans=0.2 2024-08-13 12:56:17,750 INFO [train_multi_KD3.py:844] (1/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 12:56:28,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2173910.0, ans=0.125 2024-08-13 12:57:00,948 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.972e+00 2024-08-13 12:57:06,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2174110.0, ans=0.125 2024-08-13 12:57:18,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2174110.0, ans=0.0 2024-08-13 12:57:47,606 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 50, loss[loss=0.1317, beats_loss=0.009037, ecapa_loss=0.0001316, whisper_loss=0.1213, over 24591.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.009866, ecapa_loss=0.0001704, whisper_loss=0.0911, over 882075.26 frames. ], batch size: 88, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:58:07,450 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 25 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 12:58:08,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2174310.0, ans=0.2 2024-08-13 12:58:11,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.624e+01 2.946e+01 3.270e+01 5.312e+01, threshold=5.891e+01, percent-clipped=1.0 2024-08-13 12:58:42,752 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 12:58:46,918 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.95 vs. limit=6.0 2024-08-13 12:59:21,751 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 12:59:36,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2174710.0, ans=0.125 2024-08-13 12:59:38,518 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.26 vs. limit=10.0 2024-08-13 12:59:43,499 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 100, loss[loss=0.09516, beats_loss=0.01017, ecapa_loss=0.0001287, whisper_loss=0.0837, over 14583.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.009762, ecapa_loss=0.0001694, whisper_loss=0.0903, over 1522125.44 frames. ], batch size: 56, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:59:52,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2174810.0, ans=0.0 2024-08-13 13:00:05,602 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-08-13 13:00:29,399 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-13 13:00:31,154 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 13:00:43,305 INFO [train_multi_KD3.py:844] (1/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 13:01:06,857 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 13:01:15,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-08-13 13:01:27,656 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 13 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 13:01:34,211 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 150, loss[loss=0.097, beats_loss=0.01101, ecapa_loss=0.0001616, whisper_loss=0.08437, over 21542.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.009906, ecapa_loss=0.0001669, whisper_loss=0.09185, over 2041415.01 frames. ], batch size: 86, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 13:01:35,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2175310.0, ans=0.125 2024-08-13 13:01:35,964 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2024-08-13 13:01:52,172 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.691e+01 2.914e+01 3.205e+01 4.939e+01, threshold=5.827e+01, percent-clipped=0.0 2024-08-13 13:02:06,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2175410.0, ans=0.1 2024-08-13 13:02:19,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2175510.0, ans=0.2 2024-08-13 13:02:25,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2175610.0, ans=0.0 2024-08-13 13:02:41,210 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2024-08-13 13:02:43,428 INFO [train_multi_KD3.py:844] (1/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 13:02:57,005 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 200, loss[loss=0.1053, beats_loss=0.009888, ecapa_loss=0.0001868, whisper_loss=0.09356, over 17710.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01011, ecapa_loss=0.0001662, whisper_loss=0.09218, over 2431984.50 frames. ], batch size: 71, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 13:02:59,669 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-13 13:03:06,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2175810.0, ans=0.0 2024-08-13 13:03:09,926 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2024-08-13 13:03:10,653 INFO [train_multi_KD3.py:844] (1/4) A total of 75 cuts. 30 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 13:03:21,917 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 13:03:28,155 INFO [train_multi_KD3.py:844] (1/4) A total of 86 cuts. 31 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 13:03:28,793 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.69 vs. limit=15.0 2024-08-13 13:03:32,702 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=8.0 2024-08-13 13:03:36,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2176010.0, ans=0.2 2024-08-13 13:03:54,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2176110.0, ans=0.125 2024-08-13 13:04:13,408 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 250, loss[loss=0.133, beats_loss=0.008646, ecapa_loss=0.000174, whisper_loss=0.1226, over 21314.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01025, ecapa_loss=0.0001646, whisper_loss=0.09283, over 2747773.77 frames. ], batch size: 81, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:04:13,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2176310.0, ans=0.0 2024-08-13 13:04:14,946 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 25 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 13:04:27,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.510e+01 2.289e+01 2.601e+01 2.843e+01 4.467e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-13 13:04:28,965 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 13:04:33,737 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 13:04:35,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2176410.0, ans=0.125 2024-08-13 13:04:44,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2176510.0, ans=0.2 2024-08-13 13:04:45,304 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 13:04:47,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2176510.0, ans=0.125 2024-08-13 13:05:11,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2176710.0, ans=0.2 2024-08-13 13:05:19,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2176710.0, ans=0.125 2024-08-13 13:05:25,190 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 300, loss[loss=0.09201, beats_loss=0.01419, ecapa_loss=0.0001239, whisper_loss=0.07659, over 14756.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01044, ecapa_loss=0.0001636, whisper_loss=0.09154, over 2958551.17 frames. ], batch size: 58, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:05:26,673 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 13:05:31,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2176810.0, ans=0.0 2024-08-13 13:05:51,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2176910.0, ans=0.125 2024-08-13 13:06:04,014 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 13:06:10,098 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 13:06:10,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2177110.0, ans=0.125 2024-08-13 13:06:14,478 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 13:06:38,057 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 350, loss[loss=0.1096, beats_loss=0.01046, ecapa_loss=0.0001419, whisper_loss=0.09771, over 22379.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.0001635, whisper_loss=0.09122, over 3142248.15 frames. ], batch size: 86, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:06:45,866 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 13:06:52,658 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.376e+01 2.584e+01 2.917e+01 1.097e+02, threshold=5.167e+01, percent-clipped=1.0 2024-08-13 13:07:28,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2177610.0, ans=0.2 2024-08-13 13:07:47,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2177710.0, ans=0.0 2024-08-13 13:07:51,027 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 400, loss[loss=0.1134, beats_loss=0.01003, ecapa_loss=0.0001745, whisper_loss=0.1016, over 14268.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001631, whisper_loss=0.09122, over 3308085.86 frames. ], batch size: 58, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:07:51,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2177810.0, ans=0.2 2024-08-13 13:07:51,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2177810.0, ans=0.1 2024-08-13 13:08:06,833 INFO [train_multi_KD3.py:844] (1/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 13:08:21,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2178010.0, ans=0.125 2024-08-13 13:08:47,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2178210.0, ans=0.2 2024-08-13 13:08:48,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2178210.0, ans=0.0 2024-08-13 13:09:02,110 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 450, loss[loss=0.09865, beats_loss=0.01277, ecapa_loss=0.0001144, whisper_loss=0.08474, over 19268.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001622, whisper_loss=0.09034, over 3430773.31 frames. ], batch size: 74, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:09:16,311 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.363e+01 2.643e+01 2.945e+01 6.968e+01, threshold=5.285e+01, percent-clipped=1.0 2024-08-13 13:10:08,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2178710.0, ans=0.0 2024-08-13 13:10:14,120 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 500, loss[loss=0.09129, beats_loss=0.01048, ecapa_loss=0.0001454, whisper_loss=0.07936, over 18080.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001636, whisper_loss=0.09067, over 3508419.41 frames. ], batch size: 70, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:10:22,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2178810.0, ans=0.125 2024-08-13 13:10:24,723 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 13:10:34,976 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 13:11:18,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2179210.0, ans=0.125 2024-08-13 13:11:18,738 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2179210.0, ans=0.2 2024-08-13 13:11:27,236 INFO [train_multi_KD3.py:844] (1/4) A total of 63 cuts. 14 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 13:11:28,321 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 550, loss[loss=0.0777, beats_loss=0.01286, ecapa_loss=0.0001345, whisper_loss=0.0635, over 16098.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01067, ecapa_loss=0.0001629, whisper_loss=0.08971, over 3637085.71 frames. ], batch size: 63, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:11:39,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2179310.0, ans=0.1 2024-08-13 13:11:43,365 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.371e+01 2.596e+01 2.960e+01 4.995e+01, threshold=5.192e+01, percent-clipped=0.0 2024-08-13 13:11:43,896 INFO [train_multi_KD3.py:844] (1/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 13:12:05,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2179510.0, ans=0.0 2024-08-13 13:12:40,849 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 600, loss[loss=0.1073, beats_loss=0.01037, ecapa_loss=0.0001528, whisper_loss=0.09542, over 21003.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01071, ecapa_loss=0.0001623, whisper_loss=0.08998, over 3661940.89 frames. ], batch size: 84, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:12:45,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2179810.0, ans=0.2 2024-08-13 13:12:46,694 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 13:12:47,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.35 vs. limit=10.0 2024-08-13 13:12:55,678 INFO [train_multi_KD3.py:844] (1/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 13:13:12,142 INFO [train_multi_KD3.py:844] (1/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 13:13:12,570 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-13 13:13:15,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2180010.0, ans=0.125 2024-08-13 13:13:17,949 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-13 13:13:37,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2180110.0, ans=0.1 2024-08-13 13:13:48,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2180210.0, ans=0.125 2024-08-13 13:13:53,487 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 650, loss[loss=0.1086, beats_loss=0.01147, ecapa_loss=0.0001642, whisper_loss=0.09552, over 21702.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01074, ecapa_loss=0.0001625, whisper_loss=0.08981, over 3713753.10 frames. ], batch size: 85, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:14:08,238 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.428e+01 2.791e+01 3.201e+01 6.340e+01, threshold=5.582e+01, percent-clipped=1.0 2024-08-13 13:14:11,347 INFO [train_multi_KD3.py:844] (1/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 13:14:16,001 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 18 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 13:14:16,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2180410.0, ans=0.0 2024-08-13 13:14:17,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2180410.0, ans=0.015 2024-08-13 13:14:29,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2180510.0, ans=0.0 2024-08-13 13:14:57,931 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2024-08-13 13:15:04,106 INFO [train_multi_KD3.py:844] (1/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 13:15:06,854 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 700, loss[loss=0.108, beats_loss=0.01066, ecapa_loss=0.0001635, whisper_loss=0.09574, over 18718.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001634, whisper_loss=0.0905, over 3731976.10 frames. ], batch size: 74, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:15:08,924 INFO [train_multi_KD3.py:844] (1/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 13:15:09,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2180810.0, ans=0.035 2024-08-13 13:15:10,464 INFO [train_multi_KD3.py:844] (1/4) A total of 81 cuts. 30 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 13:15:20,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2180910.0, ans=0.2 2024-08-13 13:15:23,012 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 13:15:26,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2180910.0, ans=0.1 2024-08-13 13:15:42,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-08-13 13:15:46,109 INFO [train_multi_KD3.py:844] (1/4) A total of 77 cuts. 27 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-13 13:16:08,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2181210.0, ans=0.125 2024-08-13 13:16:21,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2181310.0, ans=0.125 2024-08-13 13:16:22,129 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 750, loss[loss=0.1031, beats_loss=0.01135, ecapa_loss=0.0001735, whisper_loss=0.08999, over 16702.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001628, whisper_loss=0.09069, over 3733231.62 frames. ], batch size: 66, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:16:30,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2181310.0, ans=0.125 2024-08-13 13:16:32,085 INFO [train_multi_KD3.py:844] (1/4) A total of 73 cuts. 17 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 13:16:37,662 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.319e+01 2.745e+01 2.985e+01 9.286e+01, threshold=5.489e+01, percent-clipped=1.0 2024-08-13 13:16:51,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2181510.0, ans=0.0 2024-08-13 13:16:55,978 INFO [train_multi_KD3.py:844] (1/4) A total of 66 cuts. 15 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 13:17:09,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2181610.0, ans=0.0 2024-08-13 13:17:10,825 INFO [train_multi_KD3.py:844] (1/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 13:17:15,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2181610.0, ans=0.0 2024-08-13 13:17:37,908 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 800, loss[loss=0.08504, beats_loss=0.009293, ecapa_loss=0.0001494, whisper_loss=0.07426, over 17512.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001623, whisper_loss=0.08998, over 3748397.77 frames. ], batch size: 67, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:17:52,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2024-08-13 13:18:05,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2181910.0, ans=0.0 2024-08-13 13:18:07,125 INFO [train_multi_KD3.py:844] (1/4) A total of 59 cuts. 14 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-13 13:18:23,174 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.71 vs. limit=10.0 2024-08-13 13:18:24,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2182110.0, ans=0.0 2024-08-13 13:18:25,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2182110.0, ans=0.025 2024-08-13 13:18:43,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2182210.0, ans=0.125 2024-08-13 13:18:52,707 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 850, loss[loss=0.1042, beats_loss=0.01132, ecapa_loss=0.000147, whisper_loss=0.09142, over 17800.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.0001602, whisper_loss=0.08968, over 3786628.47 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:19:08,082 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.299e+01 2.538e+01 2.916e+01 7.643e+01, threshold=5.076e+01, percent-clipped=1.0 2024-08-13 13:19:09,834 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 13:19:26,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2182510.0, ans=0.05 2024-08-13 13:19:35,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2182510.0, ans=10.0 2024-08-13 13:19:39,176 INFO [train_multi_KD3.py:844] (1/4) A total of 72 cuts. 28 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 13:20:07,945 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 900, loss[loss=0.1153, beats_loss=0.009394, ecapa_loss=0.0001665, whisper_loss=0.1042, over 17080.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0107, ecapa_loss=0.0001598, whisper_loss=0.08972, over 3811792.03 frames. ], batch size: 67, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:20:15,232 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.83 vs. limit=15.0 2024-08-13 13:20:22,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2182910.0, ans=0.125 2024-08-13 13:20:26,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2182910.0, ans=0.04949747468305833 2024-08-13 13:20:27,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2182910.0, ans=15.0 2024-08-13 13:20:29,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2182910.0, ans=0.0 2024-08-13 13:20:41,255 INFO [train_multi_KD3.py:844] (1/4) A total of 87 cuts. 30 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-13 13:20:51,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2183010.0, ans=0.0 2024-08-13 13:21:08,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2183110.0, ans=0.0 2024-08-13 13:21:25,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2183210.0, ans=0.125 2024-08-13 13:21:29,298 INFO [train_multi_KD3.py:844] (1/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 13:21:34,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2183310.0, ans=0.05 2024-08-13 13:21:35,563 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 950, loss[loss=0.1238, beats_loss=0.01032, ecapa_loss=0.0001482, whisper_loss=0.112, over 24530.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01073, ecapa_loss=0.0001582, whisper_loss=0.08961, over 3822478.95 frames. ], batch size: 91, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:21:40,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2183310.0, ans=15.0 2024-08-13 13:21:53,200 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.345e+01 2.599e+01 2.801e+01 4.371e+01, threshold=5.198e+01, percent-clipped=0.0 2024-08-13 13:21:57,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2183410.0, ans=0.125 2024-08-13 13:22:01,232 INFO [train_multi_KD3.py:844] (1/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 13:22:01,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2183410.0, ans=0.0 2024-08-13 13:22:06,115 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2024-08-13 13:22:15,037 INFO [train_multi_KD3.py:844] (1/4) A total of 84 cuts. 30 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 13:22:16,462 INFO [train_multi_KD3.py:844] (1/4) A total of 94 cuts. 31 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 13:22:16,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2183510.0, ans=0.125 2024-08-13 13:22:37,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2183610.0, ans=0.2 2024-08-13 13:22:40,580 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2024-08-13 13:23:04,163 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.23 vs. limit=22.5 2024-08-13 13:23:14,097 INFO [train_multi_KD3.py:1116] (1/4) Epoch 16, batch 1000, loss[loss=0.09965, beats_loss=0.01021, ecapa_loss=0.0001294, whisper_loss=0.08815, over 19505.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001592, whisper_loss=0.09044, over 3821037.32 frames. ], batch size: 74, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:23:20,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2183810.0, ans=0.125 2024-08-13 13:23:45,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2183910.0, ans=0.2 2024-08-13 13:23:53,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2183910.0, ans=0.125 2024-08-13 13:24:10,275 INFO [train_multi_KD3.py:844] (1/4) A total of 68 cuts. 27 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 13:24:13,585 INFO [train_multi_KD3.py:844] (1/4) A total of 80 cuts. 23 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-13 13:24:58,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2184110.0, ans=0.125 2024-08-13 13:25:18,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2184210.0, ans=0.2